This is Jessica. “Scientific doomerism” seems to be everywhere lately, from a presidential statement that promises to restore “gold standard science” from the top down because scientists have botched things, to journals being inundated with AI-produced papers, to sleuths like Reese Richardson documenting the scale of organized scientific fraud through paper mills and collusion. In his post on this last example, Andrew wrote something that caught my attention:
And, yes, typing some prompts into a chatbot and producing a paper is fraud, in the same way that publishing textbook excerpts as if it were new research is fraud, or copying from wikipedia as if it were new research is fraud, etc etc. It doesn’t require fake data and it doesn’t require some cackling Snidely Whiplash attitude. It can be some schlub sitting at a computer terminal who wants to get his contract extended or get admitted to a Ph.D. program or whose adviser is pressuring him to get some publications . . . But it’s fraudulent publication, not the same as bad research (which is actually research, it just happens to be useless because bad measurement and kangaroo).
There are clearly some differences between passing off work containing fake evidence you purchased from paper mills and work that you contracted AI to do for you–in one case you are paying money for someone to pass something fake off as real with your name on it, whereas in the other you might be using actual data and think you’re just saving time, especially if you’re reviewing what the AI does. So should we really consider both forms of fraud?
It struck me how sharply Andrew’s perspective contrasts with the current direction of discussions among ML and other researchers interested in AI for science. There, it’s seen as inevitable and not necessarily morally problematic that the future of science will have humans largely playing the role of curators, who prompt and select among results produced by LLM agents, who do the bulk of the work. It’s worth considering what kinds of ethical lines this crosses exactly.
Let’s imagine that I give an LLM an initial high level research question related to a topic on which I am knowledgeable. It churns on the idea and ultimately designs an experiment it’s happy with, I review the plan before prompting it to continue, maybe tweaking slightly, like changing a condition or suggesting an additional robustness check. It then gathers data on my behalf (e.g., running an online experiment or downloading existing datasets), conducts the analysis, and presents me with the results. I review these and then give it permission to write up a paper. I read the final paper to make sure I know what it’s saying before I submit. Maybe I change a few things I don’t agree with. I add my name and also credit the AI. In other words, there is a light human touch throughout, but much of what is presented as my work comes from the model.
From an “optimistic” AI-for-science perspective, the strongest argument is probably to cast it as part of the scientist’s job to try to make the most of current technology. If we think AI might help us be more productive, then we should explore how much time it can save us, just like it was a good move for statistics to embrace the computational revolution that made previously intractable models commonplace. Proponents of AI for science argue that it is irresponsible not to use AI given its current capabilities, just like it could be construed as irresponsible for a brilliant researcher to refuse to use calculators if doing so meant they could contribute more useful advances to the field. Of course, this assumes that we won’t be sacrificing anything vital in the process.
The “pessimistic” view of AI generated science as fraud thinks we are sacrificing something vital in the process. But what is it exactly? If you believe that “the devil is in the details” (or “God is in every leaf of every tree”, depending whose side you want to be on) then whenever you outsource decisions you would otherwise make yourself, you have potentially compromised the work from the perspective of your own expert judgment. So putting your name on it betrays what you know to be true of good science. Of course, you could check everything down to the lowest level, and intervene whenever the agent tries to do something you don’t agree with. Then the AI is really just a means of computation–even if you use it for brainstorming what research questions to ask, you could view it as a way of extending your limited resources but without sacrificing your own scientific judgment. This requires that you are knowledgeable enough to assess everything it does. Assuming you are, then it seems hard to argue that this is fraud. Though admittedly, a lot rests on how careful you are when you check things over.
Part of the concern may be that AI makes it tempting to extend your methods or claims outside of what you know well. Without the option of using it, you would have had to do the research yourself (and presumably gain understanding in the process) in order to apply that method. Relatedly, I suspect most people would agree that when you are in a training context, like taking courses in grad school, turning in work that was largely driven by the AI is a form of fraud because it holds you back from gaining the understanding yourself.
Another version of the fraud argument focuses on misattribution of ideas. Maybe this is what Andrew had in mind when he wrote “publishing textbook excerpts as if it were new research is fraud, or copying from wikipedia as if it were new research is fraud.” If the AI produced significant parts of the contribution, like the specific hypothesis, the choice of methods, and the framing of the contributions, then those aren’t your novel creations and adding yourself as author is misattribution. But this is tricky because human scientists also recombine existing ideas, methods, and frames constantly, and we often call the resulting combinations original contributions. So the question isn’t whether ideas are derivative per se but when the degree of derivation crosses into infringement. We have copyright law for some creative domains, and we can try to formulate when AI outputs are permissible in light of this (e.g., Annie Liang has some recent work on this). But to say AI-assisted papers are fraudulent on such grounds, we’d need to work out what the scientific analog of substantial similarity is. This is hard because science explicitly values building on prior work. But we can agree on some things, like you shouldn’t do something that is too close to others’ work without citing them.
A final angle on why its fraud might be that it misportrays what science is more broadly. If you think the reasons to do science are fundamentally human–that as scientists we are concerned with producing understanding for ourselves just as much as we’re concerned about improving things in the world–then you could argue that for science to be meaningful we have to be the ones coming up with the ideas and shaping them as we go. From this perspective, automated science isn’t inherently wrong, it’s just missing the point. AI for science arguments often completely overlook the “people production” role of science. In the extreme, they envision AI finding solutions for lots of real world problems and intervening to control outcomes in the world without us understanding how any of it works. In reality, the personal side–including the search for personal fulfillment through science–is a big part of why smart people who could potentially make a lot more money in applied roles end up choosing research careers. And it’s a big part of how we evaluate scientists. How many of your Ph.D. students have gone on to competitive research positions? What does the trajectory of topics you’ve worked on say about your research taste?
Pushback to this argument might point out that by saying science is entirely a matter of human careers, we contradict claims that we as scientists like to make, about how we are dedicated to improving the state of the art in our field, or producing value for the world. Would it still be science as we know it if we started acknowledging that it’s really about personal fulfillment for scientists? But I think this is a bit of a false dichotomy. The public value of science depends on there being humans who find the work meaningful enough to do it well, including pushing back on sloppy results, exercising their taste to shape the direction in their field, training students worth training, etc. Careless AI use can threaten this by flooding the system with outputs that crowd out careful work, disincentivizing the people who would be intrinsically motivated to do quality work less likely to stick around. It also implicitly reframes science as nothing more than a pipeline for results.
My view is that AI use can go either way, depending on how you approach it. What best determines whether it’s fraud or not is the attitude you bring. It can help you do less fraudulent research if you’re the kind of person who is already very picky about what you send out to the world. But it can help you fool yourself and others if you let competitiveness and obsession with metrics drive how you use it.
As a final comment, there’s some irony in using terms like “optimism” to talk about this. I described the pro-automated science argument above as “optimistic,” because I think that’s how many in this camp see themselves–as fundamentally optimistic about the future of science and our ability to improve it by using AI. But the underlying motivation to figure out how to produce papers with as little human oversight as possible is also often deeply pessimistic. A common narrative is the “review death spiral”: AI production stresses the review system, which increases the noisiness of paper acceptance decisions, which further incentivizes submitting sloppy AI produced papers. The answer is presumed to be putting more AI in place on both sides. The idea that scientists have agency and could continue to shape the meaningfulness of what gets produced starts to seem out of the question.
Increasingly, a lot of the most enthusiastic pro-AI discourse (including for science) strikes me as nihilism masquerading as optimism. We have people who perceive themselves as huge optimists that will reshape science or society for the better simultaneously lacking the imagination to see beyond their own technological determinism. It reminds me a bit of the “optimism” associated with some open science and science reform positions, who also suggest that we just need the right technology to fix the problems (though in this case, its heuristics like replication or preregistration). It’s a fundamentally non-agentic view of human scientific endeavor.
I received a slightly different impression from Andrew’s post: by comparing them to paper mills and copying text, I think there he considered chatbots just as a way to generate text to fill an article, without regard to the content or whether there is any real data involved.
Thanks for what is a very balanced account. I agree completely with the training issue, as I suspect every does. And I agree that the misattribution issue is tricky. Bit on your first point: I have often heard the current generation of AI described as “a moderately bright and enthusiastic graduate student.” Aren’t the tasks in the initial hypothetical you describe exactly what such a human graduate student would do, with the same supervision, and the same credit, with the senior taking pride of place and crediting the graduate student, who now has a publication. There are a large number of issues to be confronted when future LLM iterations are considerably more advanced than “a moderately bright and enthusiastic graduate student.” But for now, I’m not certain I see the problem, beyond the fact that for the less sociable senior researchers, they can now get grad student-level collaboration without having to talk to anybody.
Yes, I think this is a good analogy–it’s similar to how lots of papers are produced by grad students with faculty supervision, so there’s a question of why we would see it as more fraudulent if its AI instead of a student.
I think the main problem is that there is no reasoning/training in the “supervision” of aLLM. (if we ignore all unethical exploitative work in the training of the model)
I don’t understand your point. Are you saying that nobody is trained to do this or that there is none involved? If the latter, I don’t agree. In my use, there is plenty of critical reasoning and direction by me in getting AI to do tasks that I want.
I don’t know how much work you’ve done with these models, but they actually require quite a bit of initial supervision. Whether it’s more or less than you’d give a human, I can’t say.
Quote from the blog post: “It reminds me a bit of the “optimism” associated with some open science and science reform positions, who also suggest that we just need the right technology to fix the problems (though in this case, its heuristics like replication or preregistration). It’s a fundamentally non-agentic view of human scientific endeavor.”
“It’s due to the incentives!” or some sentences like that may also fit with this non-agentic view, if I am understanding that term correctly. It always annoys me to hear, possibly because I tend to be more focused on the individual and individual choices, options, responsibilities, etc.. It is perhaps also not a coincidence that many of the open science reform people seem to me to be social psychologists, and talk a lot about collaboration, and doing things in large groups together, and things like that. I tend to dislike groups of people, group processes, and such things.
Anyway, I recently came across a paper by Flis (2019) titled “Psychologists psychologizing scientific psychology: An epistemological reading of the replication crisis”. The paper seemed complicated to, and for, me in certain ways at the time quickly reading through it, but it had some interesting thoughts about certain science reformers possible (implicit?) reasoning which may be interesting for you, or others, thinking about this all. I was reminded of it after reading the above depicted quoted section of the blog post. Here’s an excerpt from Flis (2019):
“To sum up, this section showed how the indigenous epistemology at work in the current debates is that of the biased mind of the scientist. The reformers employ a concept of individual rationality that has ontological implications—scientists think and act in a biased way because they are human, and the science system should be set up in such a way that it complements human (ir)rationality. The cause for reform is the way scientific psychology works on the level of institutionalization of research and publication practices. Currently, the science of psychology not only fails at correcting individual biases, but it reifies them into peer-reviewed literature. For the reformers, humans are fundamentally irrational, so much so that this threatens the very functioning of science.” (p. 167)
And to mention another thing, I was reminded of the recent questionnaire about choosing scientific virtues (or something like that) after reading the following section in the blog post:
“If you think the reasons to do science are fundamentally human–that as scientists we are concerned with producing understanding for ourselves just as much as we’re concerned about improving things in the world–then you could argue that for science to be meaningful we have to be the ones coming up with the ideas and shaping them as we go.”
I was wondering about the purpose of science, and the role of understanding at the time of the blog post about the questionnaire, and wondered whether some discussion about “understanding” might be part of a possible final paper. I reasoned at the time that from a certain perspective, AI can perhaps not truly be seen as producing science (or whatever words are most appropriate) because it simply doesn’t truly “understand”. Then I searched a bit whether computers can truly comprehend and understand and quickly stopped because that was way too much for me to dive into. I still wonder whether a possible paper involving the results of the virtues questionnaire might discuss this possibly relevant issue concerning understanding, or whether the questionnaire relates to that somehow.
I have seen that Flis paper actually, I recall coming across it a few years ago while thinking about how the narratives of the replication crisis can be counterproductive.
Seeking understanding seems very related to the virtues project. Maybe curiosity is the closest, but with slightly different emphasis if you think of curiosity more like an innate desire to explore new things versus a need to fully understand the things you are talking about.
Jessica:
Your post reminds me of how we spend so much effort trying to automate or routinize science, even setting aside chatbots, AI, etc. The textbooks and software we write, the case studies we carry out, our research on research methods: these are all attempts to make science more automatic, in the good sense that we are trying to develop tools and cultural “muscle memory” to allow more of the steps of science to be chunked, so that we can do our science at a higher level. From that perspective, the chatbot is just one more tool.
There is also a challenge, which is that there are many ways of doing science. Presumably one could program a chatbot to follow the pattern of trash science such as the papers in Psychological Science in the 2010-2015 period. Or, to put it another way, my colleagues and I wrote all these textbooks in part to counteract various bad advice in other textbooks.
I recently attended a workshop at the Isaac Newton Institute (Cambridge) on AI for Mathematics. Two main topics were discussed: (a) formalization of mathematical theorems and proofs so that they can be checked by LEAN (and related proof checkers) and (b) tools like AlphaEvolve that can help generate examples of mathematically interesting phenomena which (sometimes with help from machine learning) mathematicians then analyze to identify patterns and form conjectures. However, one senior mathematician said that while he finds these tools (some of them running as “agents”) very useful, his students bring him “results” that they think are correct and he quickly determines are wrong.
A similar pattern may be occurring in software engineering. While the early “smart autocomplete” systems were primarily useful for junior software engineers, it is now the most senior engineers who are able to be most effective with agentic programming. This seems to be because they have the expertise to be able to check the code and detect flaws.
With current AI technology, I think that only very routine “normal science” (sensu Kuhn) can be fully automated. Everything else requires expert supervision. This is fundamentally unsurprising, because AI systems have difficulty detecting and reasoning about novelty, and that is precisely the point of non-routine science!
At ArXiv (where I chair the CS section), we see lots of AI slop papers doing very routine and boring stuff. This may be partly by design, as the purpose of some papers is just to boost citation counts rather than to have anyone actually read the paper.
I would like to hereby reserve the right to be both nihilistic and optimistic. The nihilism comes from seeing how the typical science sausage is made; the optimism comes from the fact that I am free to give my life and science its own meaning, and that therefore even if my outputs have little impact (whether the outputs are AI-assisted or not), I am fulfilled.
+1 though science sausages, however fraught, problematic and messy the process, pretty often result in a nice sausage
I think you underestimate the damage to training the use of AI in paper generation may incur.
https://ergosphere.blog/posts/the-machines-are-fine/
As before, I remain of the opinion that LLMs are bogus, terrible, horrible, intellectually vacuous, and a bad idea. But the genie seems out of the bottle.
A nasty aspect of LLM use is that these things generate more errors than people can possibly find, and using them makes us worse at finding errors.
If “Bob” is such an awful new phenomenon, why is there so much dross in the literature pre-AI?
That was an interesting blog post. Here’s my takeaway: “The real threat is a slow, comfortable drift toward not understanding what you’re doing.” I agree that there are serious things to worry about here. But I can’t help but think that the statement literally applies to all technology humans have developed (my car, my credit cards, my thermostat, ….). Perhaps AI is the continuation (and acceleration) of a trend we have been underestimating the dangers of for a long time. I’m willing to believe that is worth thinking about.
On the other hand, the more I use AI the more I find myself engaged in critical thinking in more meaningful ways than I have in a long time. It is different of course. Instead of trying to relearn matrix algebra well enough to derive appropriate standard errors from surveys with complex stratified sampling schemes, I find myself immersed in understanding how survey weights impact standard errors and why ignoring the weights leads to overestimating the precision of estimates. I’ve gained an intuitive understanding that the matrix algebra would not have given me. I’ve lost one kind of understanding but gained another.
I don’t mean to belittle the concern about AI impeding the development of understanding. There are real dangers here and I believe these to be likely to occur. But that will be because people will misuse these tools and because our educational system fails to meaningfully teach people how to use them. From my experience, LLMs make errors in the same ways that people do – which isn’t surprising since they are trained on materials created by humans. They can generate errors quickly and in abundance compared with humans. But they also can generate more complex and realistic analysis compared with humans. I don’t share David’s dismissal of AI though I do share many of his concerns about bad things that are likely to happen.
The guy from the paper-mill is just using an LLM, so it will be interested to if that profession isn’t negatively impacted by AI. Theoretically, everyone becomes their own papermill. I knew a guy in University who would write papers for other students (done in one night, but guaranteed good enough to pass). I guess this kind of side-hustle has gone the way of the dodo.