“Here’s an interesting story right in your sweet spot”

Jonathan Falk writes:

Here’s an interesting story right in your sweet spot:

Large effects from something whose possible effects couldn’t be that large? Check.
Finding something in a sample of 1024 people that requires 34,000 to gain adequate power? Check.
Misuse of p values? Check
Science journalist hype? Check
Searching for the cause of an effect that isn’t real enough to matter? Check.
Searching for random associatedness? (Nostalgia-proneness?) Check.
Multiple negative studies ignored? Check.

And some great observations from Scott Alexander [author of the linked post]:

First, what bothers me isn’t just that people said 5-HTTLPR mattered and it didn’t. It’s that we built whole imaginary edifices, whole castles in the air on top of this idea of 5-HTTLPR mattering. We “figured out” how 5-HTTLPR exerted its effects, what parts of the brain it was active in, what sorts of things it interacted with, how its effects were enhanced or suppressed by the effects of other imaginary depression genes. This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Bigfoot.

Alexander links to this letter by
Nina Rieckmann, Michael Rapp, and Jacqueline Müller-Nordhorn published in 2009 in the Journal of the American Medical Association:

Dr Risch and colleagues concluded that the results of a study showing that the serotonin transporter gene (5-HTTLPR) genotype moderates the effect of stressful life events on the risk of depression could not be replicated in a meta-analysis of 14 studies. The authors pointed out the importance of replication studies before new findings are translated into clinical and health practices. We believe that it is also important to note that editorial practices of scientific journals may contribute to the lack of attention received by studies that fail to replicate original findings.

The original study was published in 2003 in Science, a prominent journal with a very high impact factor that year. In the year following its publication, it was cited 110 times in sources indexed in the Web of Science citation report. In 2005, the first study that failed to replicate the original finding in a sample of 1091 participants was published in Psychological Medicine, a specialized journal with a relatively low impact factor. That study was cited 24 times in the following year.

We believe that unless editors actively encourage the submission of null findings, replication studies, and contradictory results alike, the premature uncritical adoption of new findings will continue to influence the way resources are allocated in research and clinical practice settings. Studies that do not replicate an important finding and that meet high methodological standards should be considered for publication by influential journals at the same level as the respective original reports. This will encourage researchers to conduct replication studies and to make primary data easily accessible.

I can’t really comment on the substance or the statistics here, since I haven’t read any of these articles, nor do I have any understanding of the arguments from genetics.

Setting all that aside, it’s striking (a) that Rieckmann et al. made this impassioned speech back in 2009, well before the general awareness of replication issues in science, and (b) this was all 10 years ago but it seems that this is still a live issue, as Alexander is writing about it now. Rieckmann et al. must find all this very frustrating.

Jonathan pointed me to this story in May, and I informed him that this post would appear in Oct. Given that the problem was pointed out 10 years ago (in JAMA, no less!), I guess there’s no rush.

P.S. More on this from savvy science writer Ed Yong.

13 thoughts on ““Here’s an interesting story right in your sweet spot”

  1. First, what bothers me isn’t just that people said 5-HTTLPR mattered and it didn’t. It’s that we built whole imaginary edifices, whole castles in the air on top of this idea of 5-HTTLPR mattering. We “figured out” how 5-HTTLPR exerted its effects, what parts of the brain it was active in, what sorts of things it interacted with, how its effects were enhanced or suppressed by the effects of other imaginary depression genes. This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Bigfoot.

    What will bother him even more is to find out that not only is this unsurprising, it is the standard state of affairs in biomedical research and should be expected. Just based on knowing “how the sausage is made” we should expect it. But even if you have no direct experience you can look at what happens when people try to replicate this junk:

    A) Out of 50 studies, 32 dropped because it was too expensive to even figure out how to replicate it. As for the rest, only half were “mostly repeatable”. That was last summer, not sure what the update is on this.
    https://www.sciencemag.org/news/2018/07/plan-replicate-50-high-impact-cancer-papers-shrinks-just-18

    B) Amgen reports being unable to replicate 90% of preclinical cancer papers, then three more. https://www.nature.com/news/biotech-giant-publishes-failures-to-confirm-high-profile-science-1.19269

    C) Bayer reports only 25% of studies could be replicated: https://www.nature.com/articles/nrd3439-c1

    D) An NIH run study reports only 1/12 studies regarding spinal cord injury fully replicated: https://www.sciencedirect.com/science/article/pii/S0014488611002391

    https://statmodeling.stat.columbia.edu/2019/03/05/back-to-basics-how-is-statistics-relevant-to-scientific-discovery/

    And figuring out how to reproducible get the same results is the easy part. Interpreting them correctly is the hard part.

    • “Interpreting them correctly is the hard part.”

      isn’t the idea to create an experiment that gives a “this or that” result? That is, an experiment should be a hypothesis test (meaning a logical hypothesis, not a statistical “null hypothesis”).

      I’m listening to a book right now on the evolutionary impacts of horizontal gene transfer. The book recounted an experiment by a researcher working on infectious bacteria I think during the 1920s. The researcher recognized among the four types or strains of the bacteria “smooth” and “rough” morphology, which corresponded to “low-virulence” and “high virulence”. But when he mixed dead rough with live smooth and injected into mice, the mice died. The only possible conclusion was that the live smooth bacteria had somehow obtained their virulence from the dead rough bacteria. The experiment couldn’t determine how – that’s for the next hypothesis. But the experiment left no room for interpretation: some sort of virulence transfer occurred.

  2. From the abstract of the paper behind this:

    Instead, the results suggest that early hypotheses about depression candidate genes were incorrect and that the large number of associations reported in the depression candidate gene literature are likely to be false positives.

    https://www.ncbi.nlm.nih.gov/pubmed/30845820

    These were not false positives, they were true positives. The null model really was wrong. It is just going to be more of the same until people figure this out.

      • Yes, the null model being false has little to do with their favorite explanation being correct. This is the first step they are required to recognize to get on the path back to science.

  3. A couple of choice quotes from the Yong article:

    “Sometimes, researchers futz with their data until they get something interesting, or retrofit their questions to match their answers. ”

    “Dorothy Bishop of the University of Oxford argues that institutions and funders that supported candidate-gene work in depression should also be asking themselves some hard questions. “They need to recognize that even those who think they are elite are not immune to poor reproducibility, which leads to a huge amount of waste,” she says.”

  4. Well! A visit from the Assumptions!

    “Back then, tools for sequencing DNA weren’t as cheap or powerful as they are today. When researchers wanted to work out which genes might affect a disease or trait, they made educated guesses, and picked likely “candidate genes.” ”

    I guess everyone forgot that the educated guess about the location of the bridge might be wrong and proceeded to drive off the cliff.

    • Would be interesting to think through the incentives here. Obviously, there wasn’t sufficient incentive to confirm or build more support for the educated guess, but why?

      Is / was conventional wisdom so powerful that no one could even imagine it was wrong?

      Is it because there’s no big career reward for taking the punch bowl away from the party? What’s the “impact factor” for party killers?

      Maybe NSF should offer “debunking grants”!

    • D:

      I think that’s a bit harsh. As we discussed in the link thread, I see where you’re coming from with your criticisms but I don’t think the paper is so bad. That said, I would agree with the weaker claim that, even if the paper is not so bad, it’s still observational, so Yong should’ve reported the results with a bit of skepticism.

      I still think Yong is savvy, even though in this particular case he may have been a bit too credulous.

      Regarding his lack of response: All I can say is that, in general, I wish there was some expectation that journalists could go back and assess their earlier work. It seems that journalism is just set up for reporters to go on to the next story and never look back.

  5. > I don’t think the paper is so bad.

    Unless important people like you start judging harshly those, like these authors, who refused to even share their code, progress will be difficult.

    I view a refusal to share code as a mortal sin. No authors who do this deserve to have their research described as not so bad

    Also, I see this paper as much worse than the typical garden-path walking observational mess. They make a specific claim: that patient assignment can be treated as quasi-random. Their own data shows (obviously!) that this is nonsense. They were either too stupid to notice this or too venal to admit it.

    > I still think Yong is savvy

    Well, he is savvy at giving big press to nonsense studies with ideologically pleasing (to him) conclusions.

Leave a Reply to Anoneuoid Cancel reply

Your email address will not be published. Required fields are marked *