Skip to content

Statistical behavior at the end of the world: the effect of the publication crisis on U.S. research productivity

Under the heading, “I’m suspicious,” Kevin Lewis points us to this article with abstract:

We exploit the timing of the Cuban Missile Crisis and the geographical variation in mortality risks individuals faced across states to analyse reproduction decisions during the crisis. The results of a difference-in-differences approach show evidence that fertility decreased in states that are farther from Cuba and increased in states with more military installations. Our findings suggest that individuals are more likely to engage in reproductive activities when facing high mortality risks, but reduce fertility when facing a high probability of enduring the aftermath of a catastrophe.

It’s the usual story: forking paths (nothing in the main effect, followed by a selection among the many many possible two-way and three-way interactions that could be studied), followed by convoluted storytelling (“individuals indulge in reproductive activities when facing high mortality risks, but reduce fertility when facing the possibility to endure the aftermath of the impending catastrophe.” This sort of paper reflects the problems with null hypothesis significance testing amidst abundant researcher degrees of freedom (find a statistically significant p-value, then tell a story) and also the problems with the “identification strategy” approach that is now standard in the social sciences: if you can do a regression discontinuity, or a difference in differences, or an instrumental variable, than all skepticism is turned off.

That said, I have no objection to this paper being published: it’s a pattern in data, no reason not to share it with the world. No reason to take it seriously, either.

P.S. I’m not trying to single this paper out, it just happens to be something that someone emailed to me. I think it’s good for us to take the general position not to get all excited about these sorts of correlations that people dredge up.

Sometimes I think this sort of criticism annoys people—Why do I bring up forking paths again? Doesn’t every study have forking paths? Etc. But the replication crisis is real. When there’s no good empirical evidence and no good theory, there’s no reason to take a study seriously.

The replication crisis arises because:
1. Lots of studies don’t replicate.
2. Many of these studies are written up in a superficially convincing way, with identification strategies and statistically significant p-values.

So, no, it’s not enough that an idea seems plausible and is backed up by low p-values. The plausibility is open-ended—these theories typically could predict the exact opposite behavior as seen in the data, and indeed sometimes do—and the p-values tell us next to nothing.

To put it another way, suppose this study were exactly the same but it had never been published in a peer-reviewed journal. Maybe it was on a preprint server, or, ummm, I dunno, just on some grad student’s computer, or maybe just typed up on some pile of papers that you came across in the bus station. Then would you take it seriously? Then would you feel I’m being unfair to pick on the forking paths? The point is that, as Simmons/Nelson/Simonsohn and others (intentionally) and Bem and others (unintentionally) have pointed out many times, it’s easy to get statistical significance from pure noise. And as Deaton/Cartwright and others have pointed out, the presence of some sort of identification strategy in an analysis does not imply that it tells us anything about causal inference and generalizations outside the data at hand. There’s nothing wrong with researchers coming up with interesting ideas (in this case, looking at external risks and reproductive behavior), but we’re in kangaroo territory here. Or maybe not; draw your own conclusions. The data just don’t support the strong claims made in the paper.


  1. Terry says:

    Off topic:

    Dan’s previous post on sexual harassment drew an enormous amount of interest. I would find it interesting to have Dan write up a blog post addressing the Aziz Ansari sexual harassment case. It would be very helpful in clarifying some of the issues raised in his previous post.

    To focus attention, I suggest he address the following two questions and limit himself to these two questions.

    1. Taking the woman’s allegation at face value, was this sexual harassment? Use only this article and Ansari’s statement:, and

    2. Assume you are a college administrator tasked with dealing with allegations of sexual harassment as you see fit. You are presented with this case, and you have only these two statements to go on. What do you do?

    • Andrew says:


      Not to speak for Dan, but I doubt that blogging more on sexual harassment is on his agenda!

    • Anonymous says:

      If I remember correctly, Dan’s post was about sexual harassment in statistics so it was relevant here. Your post is about a comedian, leading me to an honest question about webtiquette.

      If you went to a blog about sexual harassment and started trying to get a conversation about statistical technicalities going, would it be considered rude? How would you expect the host to respond?

      I realize that this is only perpetuating the off topic discussion but I really am wondering about this.

      • Terry says:

        I would completely understand if Andrew/Dan ignored or even deleted my comment because it is off topic. Their blog, their rules.
        (Although Andrew has an admirable policy of not deleting posts). That is why I labeled it as off topic.

        Dan’s harassment post was very popular and the discussion wide ranging. Dan clearly relished the issue, so I thought he might want to extend the discussion. I learned a lot from the last one by just observing, and I would like to learn more. Since much of the previous discussion was rather general, I thought it would sharpen the discussion to consider a very specific set of facts.

        Again, it was only a suggestion. Not my blog, not my rules.

      • Corey Yanofsky says:

        “netiquette”, for the love of all internet traditions!

  2. Jonathan (another one) says:

    I was six years old during the Cuban missile crisis, in school in Atlanta. In advance of each month, our teachers had us create folders to hold the next month’s work. The cover of my November 1962 folder, prepared at the height of the crisis, depicted a graveyard with the names of my classmates on the headstones (it didn’t help that Halloween was coming.) Even though we were told that our proximity to Cuba and Warner-Robins AFB made us a prime target, I assure you that none of us first graders considered sexual activity as a plausible response, and, even though very close, we all remained infertile for a substantial period post-crisis. So I doubt this study’s finding.

  3. Paul Fisher says:

    A couple notes:

    1) This is perfectly timed given the events in Hawaii this past weekend.

    2) Someone should measure birthrates/abortion rates in Hawaii over the next 9 months to see if their was a spike in fertility driven by the belief of a nuclear apocalypse. This would serve as a nice replication of the discussed pattern. My prior is little to no long-term effect.

    • Jonathan (another one) says:

      This just shows how little you understand replication. When we get little effect, this is because: (a) a several hour crisis differs from a multi-day crisis; (b) Hawaii is composed of islands, and there’s an “island effect;” (c) the movie Dr. Strangelove changed everyone’s behavior; or (d) all of the above. Unless there’s a big effect, in which case critics of the first study get to make exactly the same arguments going the other way.

      • Paul Fisher says:

        (a) We can assume being in a era of social media, several weeks in the 60s is the same as several hours. (b) Cuba is an island too. (c) Given the film’s age, the only people who are of age to have children and seen the movie are social outcasts unlikely to have reproductive opportunities (Evidence is self). (d) Trump’s election indicates nostalgia for the 50s and 60s making the comparison valid.

        In all seriousness, the sample size is likely to small to get reliable estimates either way.

  4. Daniel Ozer says:

    Suppose the last (quoted) sentence had read: “Our findings lead us to wonder whether individuals are more likely to engage in reproductive activities when facing high mortality risks, but reduce fertility when facing a high probability of enduring the aftermath of a catastrophe?”
    Should a report of the data concluding with a question rather than a conclusion about human nature be published? Would it have been?

Leave a Reply