Skip to content

Don’t believe everything you read in the (scientific) papers

A journalist writes in with a question:

This study on [sexy topic] is getting a lot of attention, and I wanted to see if you had a few minutes to look it over for me . . .

Basically, I am somewhat skeptical of [sexy subject area] explanations of complex behavior, and in this case I’m wondering whether there’s a case to be made that the researchers are taking a not-too-strong interaction effect and weaving a compelling story about it. . . .

Anyway, obviously not expecting you to chime in on [general subject area], but thoughts on whether the basic stats are sound here would be appreciated.

The paper was attached, and I looked at it.

My reply to the journalist:

I’ll believe it when I see a pre-registered replication. And not before.

P.S. Just to be clear, I wouldn’t give that response to every paper that is sent to me. I’m convinced by lots and lots of empirical research that hasn’t been replicated, pre-registered or otherwise. But in this case, where there’s a whole heap of possible comparisons, all of which are consistent with the general theory being expressed, I’m definitely concerned that we’re in “power = 0.06” territory.


  1. Phil says:

    Whoops, don’t forget, not a “sexy topic”, a “hot topic” or a “topic of the month” or similar.

    • Andrew says:


      No, it was literally a sexy topic. The paper was about sex. It was about sexual intercourse.

    • Anonymous says:

      Can you please stop using the word “Phil”? My family heralds from the ancestral homeland of the Philistines. People would say to me “hey Phil … ostine”. I cried so many times in college at this insensitive language I eventually dropped out of school. Every time I hear the name “Phil” is reminds me of the oppression of my people.

      Please stop oppressing me and please add “Phil” to the list of banned words. Thank you.

  2. Noah Motion says:

    Any particular reason you don’t want to publish more details about the specific paper, the topic, or the general subject area?

    • Andrew says:


      It’s no big deal, I just wanted to make the point that this sort of skepticism has become my general reaction. I wanted to make that point without getting into the details of some silly study.

      What stunned me in encountering this example was how little time it took for me to come to this conclusion. In the old days I would’ve agonized over possible flaws in the paper. But now I know to just not believe these little studies. Not feeling that I have to believe . . . it’s so liberating!

      • Rahul says:

        You can find a “whole heap of possible comparisons” literally in every paper. I think it’s just idiosyncratic which ones we choose to not believe.

        I’m wondering, could you train a heuristic computer algorithm on the believablity index of a paper? It might be a worthwhile tool.

        • question says:

          Yes, you search for images containing dynamite charts (use the error bars) and grep “p<0.05". There are some exceptions to those rules, but really it is too late to worry about that.

        • Wouldn’t work, because you cannot know how many unreported analyses were done before the final one was settled on.

          • Rahul says:

            Yes but neither does a human reader.

          • Keith O'Rourke says:

            Right, to take a study as credible evidence you need “insider information about the folks that did it” e.g. did they have a focus hypothesis, was the study designed to address that, did they ensure data was accurately recorded and double checked, would they have held off if they were unsure and would they have report the suspected weaknesses.

            Waiting for replication is a cure for ignorance of that (as well as a cure for being mistaken about it.)

            Good thing everyone is clear about exactly how one assesses replication ;-)

            • Andrew says:


              Traditionally, “published in a top journal” was considered to be this sort of insider information, but that won’t work anymore (if it ever did). Another possibility is if researchers have “skin in the game,” that is, if they get some benefit for making correct claims and get hurt for making claims that don’t hold up. I think that holds in some areas of survey research, for example.

  3. Todd says:

    +1 for replication. Counterintuitive or bizarre findings should be replicated BEFORE publication, especially given how easy it is to collect data these days. While some blame clearly rests with the study authors, journal editors are also to blame for accepting work that has not been thoroughly vetted with different samples and in various contexts. It makes the entire discipline look bad when laughable work is the center of attention.

  4. jonathan says:

    I recently read an interesting paper on minimum wage and employment. The key section, however, completely left me cold: they used a proxy to address the obvious issue that different states have different economies so how do you unravel the effects of that particular economy versus the effect of a minimum wage increase. They chose a rational proxy, a housing index. But then the paper basically goes dark, so I’m left at the end with a bunch of nonsense: even if you place 100% confidence in the logical relation of this proxy to the issue, you still can’t figure out how much the calculation depends on that versus whatever internally generated response they find apart from the proxy. And since I can’t place anything like that degree of confidence, I can’s say it’s much more than a modeling exercise. My point: logic, clearly explained and ideally without huge gaps, overcomes some replication issues or, perhaps more accurately, helps focus replication more exactly. But when your model is pinned together bits, I have no interest in replication.

Leave a Reply