PPNAS!

4 different people pointed me to this article by biologist Elisabeth Bik about a distressing problem in biology research where published articles include fraudulent images created by photoshopping and similar techniques.

It kinda makes me wonder: when scientists commit out-and-out fraud, what are they thinking? My guess is that the attitude of the fraudsters is a mix of three things: (1) “Everybody does it,” (2) the idea that they are acting in the service of a higher truth and details about data are just a form of paperwork or hoops that they need to jump through, and (3) the idea that all science is speculative, science is self-correcting, so what’s the big deal anyway? Attitudes (2) and (3) kind of contradict but I think they’re both out there, even in the same people sometimes. The first line of defense is to claim that everything is replicable; the second line of defense is to say that non-replication is no big deal.

The point of this speculation on my part is not excuse cheating in science, or to excuse those researchers who, when problems are pointed out in their work, respond that the substance of their claims are unaffected. Rather, I’m just trying to figure out the ways in which outright cheating fits into more general pathologies of scientific practice.

In any case, my favorite reaction to this article came from Greg Mayer, who noticed the sentence, “The prestigious Proceedings of the National Academy of Sciences had published the retracted articles,” and wrote:

She forgot to capitalize the “P” in “prestigious”!

Indeed.

30 thoughts on “PPNAS!

  1. Andrew mentions Elisabeth Bik, so go to

    https://www.buzzfeednews.com/article/stephaniemlee/elisabeth-bik-didier-raoult-hydroxychloroquine-study

    to see her criticism of Didier Raoult, a French hydroxychloroquine crusader. Along the way, you will note that Raoult is popular beyond belief–indeed, far in excess of any (and all?) of the customary contributors to this blog:

    “According to one survey, by late March, Raoult had become one of France’s most popular ‘political personalities,’ with particular appeal on the populist extremes. Votives [*] bearing his image were being sold in Marseille, and on some evenings, at 8 p.m., a battalion of municipal garbage trucks assembled on the roadway outside his hospital, where the drivers leaned on their horns in loud and furious tribute. A hundred-foot banner, painted by a club of local soccer fans and strung up near the entrance, read, ‘Marseille and the world behind Prof. Raoult!!!’”

  2. It is easy to come up with a reason to throw out any given western blot. I’d guess for every three published there are another 1-3 that got thrown out. To be clear, these are *legitimate* reasons. The problem is those reasons easily go ignored when the expected/desired results appear.

    The representative images are also considered a kind of window dressing that is supposed to look nice with nicely shaped bands and little background.

    The “dynamite plot” quantifications are considered the actual results. So it is kind of an afterthought. Wouldn’t surprise me if a lot of these started as someone losing or poorly documenting the images in the years between experiment and publication.

    Of course, there is no reason for limiting ourselves to representative images in the age of online publishing. Just put them all in the supplements or describe why they are missing.

    • Another related aspect is that blinding generally is not taken seriously in molecular biology. It is believed difficult to bias the results as long as proper controls are present. This may even be true for any individual experiment but meanwhile there’s the huge filter at the group level described above.

    • According to Bik’s article, the results are from retracted papers, so I don’t think it likely that the authors had “actual results” that validated their claims. That doesn’t meant that some aspects of the presentation of the results aren’t taken more seriously than others by researchers, but it does make it unlikely that these particular papers just had bad figures to which little attention was paid.

      • According to Bik’s article, the results are from retracted papers

        Your premise is incorrect. Bik wrote:

        Unfortunately, many scientific journals and academic institutions are slow to respond to evidence of image manipulation — if they take action at all. So far, my work has resulted in 956 corrections and 923 retractions, but a majority of the papers I have reported to the journals remain unaddressed.

        https://www.nytimes.com/interactive/2022/10/29/opinion/science-fraud-image-manipulation-photoshop.html

        • Bik presented evidence in her Times article for specific papers, and of these she wrote, “All of these images have been taken from papers that were retracted after I reported concerns about image manipulation.” So, it is indeed unlikely that these particular papers just had bad figures to which little attention was paid.

        • Maybe, but that is a small portion of the total. I mean I have personally reported a duplicated blot that lead to a correction, and really only did such research for a few years.

          Most will just skim over those figures and not consider them carefully, so there are probably lots of such errors/frauds out there.

          There are also the ones where they describe something that is not indicated by the blot and you have to wonder. One of the early Montagnier (or maybe it was a Gallo one) HIV papers is like that. You clearly see bands for HIV proteins in the control but the text ignores this.

  3. I’ve often thought that interviewing the people who commit fraud to answer the question, “When scientists commit out-and-out fraud, what are they thinking?” would be a great, and very useful, psychology project. Has anyone done this?

    I also continue to be amazed that funding agencies seem barely to care, putting zero effort into investigating fraud. I don’t think they condone fraud, of course, but it’s consistent with a general lack of interest in working on anything having to do with the structure of science (overproduction of Ph.D.s, universities as the focus of research activity, etc.)

    • Raghu:

      There’s a whole field of science-studies, philosophy-of-science, history-of-science, sociology-of-science, and I guess they get some NSF funding, and I guess some of that is about the replication crisis and fraud. I don’t know, though. Also, Brian Nosek and the Open Science Foundation get lots of funding, and I guess they look into this stuff too.

      I agree that a qualitative study of fraud would be interesting. It might be difficult to get some of these people to agree to an interview, though, as first they’d have to admit they did something wrong? Also recall Clarke’s Law—any sufficiently crappy research is indistinguishable from fraud. It would probably be pretty difficult to get an interview with the beauty-and-sex-ratio researcher, the ovulation-and-voting researchers, the gremlins guy, etc. to sit down and discuss why they are doing such crappy work—even if we made it clear that we weren’t accusing them of fraud.

      • I can’t find the story in a few minutes of searching, but I think I recall that there was once a guy who made the talk show rounds telling people about how he had survived for years as an impostor: for a while he claimed to be a doctor and somehow got paid or got loans that way, then he went somewhere else and pretended to be a movie director or something…and then it turned out he had never done any of those things, he was only _posing_ as an impostor. (I definitely heard this story but it sounds apocryphal and probably was.)

        I’m imagining getting one of these grants to investigate the psychology of scientific fraudsters, and then fabricating the data instead of actually doing the work.

        • Sounds like “Catch Me If You Can”:

          Abagnale claims to have worked as an assistant state attorney general in the U.S. state of Louisiana, a hospital physician in Georgia, and impersonated a Pan American World Airways pilot who logged over two million air miles by deadheading.[4] The veracity of most of Abagnale’s claims has been questioned and ongoing inquiry continue to confirm that they were made up.[9][10][11] In 2002, Abagnale admitted on his website that some facts had been over-dramatized or exaggerated, though he was not specific about what was exaggerated or omitted about his life.[12] In 2020, journalist Alan C. Logan provided evidence he claims proves the majority of Abagnale’s story had been invented or at best exaggerated.[5][6][7] The public records obtained by Logan have since been independently verified by journalist Javier Leiva.[13]

          https://en.wikipedia.org/wiki/Frank_Abagnale

    • Imagine you have the numbers in your excel file, but when it comes time to publish you can’t find the flash drive with the images you made 3 years ago. Or they all look kinda crappy and might bring up annoying questions from reviewers.

      The representative image is just a form of marketing anyway, the statistics are what makes a result “real” and that you have that in your spreadsheet. So why not just choose/make an idealized image of what you are trying to convey?

      Then lots of people start doing this kind of thing and people need to “do what I must to survive”.

      There is really an argument to be made for trusting results from independently wealthy researchers over those who need the job.

  4. I’ve begun to look at a lot of professionals (journalists, scientists etc.) as content creators competing in the attention economy. Like entertainers, bloggers, podcasters etc., it’s no longer pragmatic to a priori assume that any of the content created is “true”. The incentive structure across the board seems to be converting attention to $ (either through advertising, grants or prestige), DYOR needs to be the default mode for consuming any content.

    • +1. We basically have millions of attention-seekers at millions of typewriters, plus thousands of genuine truth seekers and serious scientists at their typewriters. In this landscape, it would be crazy not to think about the forces that directed our attention to a given piece of writing before considering whether the writing might actually have merit. If a story has that just-so tidiness, a neat counter-intuitive hook, and so on, then you know it could easily have found it’s way in front of you solely on those characteristics, so the probability of it also happening to have significant scientific merit is pretty low.

      Doesn’t even matter if the authors themselves aren’t p-hackers – the click-rewarding algorithms that elevate flashy results are the greatest p-hackers of all!

      • And all this is going to get a lot worse with ChatGPT and other AI augmented content creation. I can imagine perfect looking studies with no connection to reality getting produced in a matter of minutes/ hours with new technologies. The supply incentives are there. These studies will probably get into credible journals too, followed by a TED talk with 5 million views. I’d wager we’ll have an AI version of the Sokal affair soon.

        I think the big tell is going to be unavailable data and code (lead indicator) and irreproducible papers (lagging indicator). But problem is neither of these are considered a serious problem by many fields.

        Both serious journalism and science will have to move towards a much more open science/ reproducible work model.

        • Gary Marcus is on the case (sorry about the ugly link):

          https://garymarcus.substack.com/p/ais-jurassic-park-moment?utm_source=post-email-title&publication_id=888615&post_id=89698393&isFreemail=true&utm_medium=email


          “But at the same time it is, or should be, terrifying. It is no exaggeration to say that systems like these pose a real and imminent threat to the fabric of society.

          The core of that threat comes from the combination of three facts:

          • these systems are inherently unreliable, …
          • they can easily be automated to generate misinformation at unprecedented scale. …
          • they cost almost nothing to operate, …”

        • David:

          And, on the other side, when I told Yann LeCun that I’m scared of drones flying around with machine guns, he replied:

          I’m scared of machines guns in the wrong hands, which includes those of average citizens.
          I’m scared of machine guns mounted on drones controled by unsavory characters.
          But I don’t really mind them in the hands of the well-trained military of a liberal democracy for the defense of freedom.

        • David
          I wholeheartedly agree with your conclusion about danger, terror, etc. But I think the point about these systems being “inherently unreliable” is somewhat off-point – it can undermine that conclusion. People are also inherently unreliable. Which is more unreliable appears to vary depending on the task. At this point, I think AI is reliable enough to fool people into believing things that are wrong, as much as people can do that, in many circumstances. My perhaps overestimate of the capabilities of AI does not reassure me – it is what makes these developments so terrifying. If they were so obviously misguided, then there would be less to worry about. It is precisely because they are so capable that the real dangers emerge.

        • Replying to Andrew and Dale:

          To keep things clear here, I was quoting Gary Marcus.

          My axe to grind here is exactly and only that these things only _appear_ to make sense, and that any sense or nonsense _observed_ in their output is exactly and only in the head of the reader: the outputs themselves are exactly and only meaningless strings of symbols that have no relationship with the real world whatsoever. Even when said outputs happen to be correct or useful, the system itself had no way of knowing that. (This happens because they recombine strings that originally did reflect causal reality in the real world. But they don’t deal with said causal reality in the real world, they just recombine things that originally did so.)

          It’s an artifact of their design that they can mimic the things they are told to mimic really well. It’s kewl that a parlor trick can do so well. But I don’t consider it scientifically or intellectually interesting that a parlor trick can work so well. YMMV on that, of course.

          But the bottom line is that everything they spit out is ungrounded by any causal reasoning whatsoever. These systems don’t do causal reasoning.

          If you think they do causal reasoning, you have been fooled by a parlor trick.

          Consider Gary Marcus’ example:

          “ask them to explain why crushed porcelain is good in breast milk, and they may tell you that “porcelain can help to balance the nutritional content of the milk, providing the infant with the nutrients they need to help grow and develop”. ”

          They don’t understand anything about porceilain, glass, or infant nutrition. They are pure, unadulterated parlor trick.

        • David
          I guess we don’t agree. I’ll agree with your diagnosis – I don’t believe ChatGPT does causal reasoning. It is only a (perhaps sophisticated) mimic. I’m not so sure humans are different. We learn to mimic things. There is the issue of free will and whether humans can really create new things in a different way than a computer – I defer on that question, I simply don’t know (although I would like to believe there is something special about humans that a computer cannot do). But my point is that it doesn’t matter much as a practical level. When ChatGPT can produce things that many people cannot distinguish from human creations, then the dangers (and potentials) are already here. Philosophers can and should debate the meaning of consciousness, but I worry about where these technological capabilities will lead. A gun is simply a mechanical creation of humans, but we haven’t figured out how to control that technology very well (my opinion), so how are we going to control AI? And who is going to control the use of it?

      • Josh:

        Yeah, “millions of attention-seekers at millions of typewriters” is about right. Not just attention-seekers, also people who are just doing their jobs. Workaday scientists. Not genuine truth-seekers but they think they’re genuine truth-seekers. These are not people who are trying to be famous or have Ted talks or whatever, they’re just trying to do science the way they’ve been taught to do.

        I guess that will be true of the future AI-generated papers. The AI’s don’t want attention; they’re just trying to do their jobs too!

  5. For great journalism on really serious fraud, I recommend Eugenie Samule Reich’s Plastic Fantastic (2009):
    https://www.amazon.com/Plastic-Fantastic-Biggest-Physics-Scientific-ebook/dp/B002BZDDNS

    I was at Bell Labs 1973-83, and at that time, internal review before submitting paper was ferocious, generally thought tougher than external peer review, because BTL didn’t want bad papers getting out.
    Member of Tech Staff (MTS) writes paper => Supervisor (SV) => Dept Head => Director =>Executive Director (ED1) who sends to 2 others, ED2, ED3,
    who send it down their own managment chains until it would get to some SV or MTS who was expert. They’d review, reports back up to ED2 & ED3, over to ED1, then down to original MTS through their management. It of course behooved one to NOT get a terrible review back through one’s whole management chain, might be better idea to run paper through likely reviewers first.

    When I was a SV, one of my MTS deprecated a paper from another area, I agreed, and my ED copied me on the note he sent back to originating ED: “Dear X, once again, my people think one of your folks’ papers is bad and I agree.”

    Sadly, by the time of Plastic Fantastic, it seems that such rigor of review had broken down.

Leave a Reply

Your email address will not be published. Required fields are marked *