“Figure 1 looks like random variation to me” . . . indeed, so it does. And Figure 2 as well! But statistical significance was found, so this bit of randomness was published in a top journal. Business as usual in the statistical-industrial complex. Still, I’d hope the BMJ could’ve done better.

Gregory Hunter writes:

The following article made it to the national news in Canada this week.

I [Hunter] read it and was fairly appalled by their statistical methods. It seems that they went looking for a particular result in Canadian birthrate data, and then arranged to find it. Figure 1 looks like random variation to me. I don’t know if it warrants mention in your blog, but it did get into the British Medical Journal.

That’s too bad about it being in the British Medical Journal. Lancet, sure, they’re notorious for publishing politically motivated clickbait. But I thought BMJ was more serious. I guess anyone can make mistakes.

Anyway, getting to the statistics . . . the article is called “Outcome of the 2016 United States presidential election and the subsequent sex ratio at birth in Canada: an ecological study,” and its results are a mess of forking paths:

We hypothesised that the unexpected outcome of the 2016 US presidential election may have been a societal stressor for liberal-leaning populations and thereby precipitated such an effect on the sex ratio in Canada. . . . In the 12 months following the election, the lowest sex ratio occurred in March 2017 (4 months post election). Compared with the preceding months, the sex ratio was lower in the 5 months from March to July 2017 (p=0.02) during which time it was rising (p=0.01), reflecting recovery from the nadir. Both effects were seen in liberal-leaning regions of Ontario (lower sex ratio (p=0.006) and recovery (p=0.002) in March–July 2017) but not in conservative-leaning areas (p=0.12 and p=0.49, respectively).

In addition to forking paths, we also see the statistical fallacy of comparing significant to non-significant.

To their credit, the authors show the data:

As is often the case, if you look at the data without all those lines, you see something that looks like a bunch of numbers with no clear pattern.

The claims made in this article do not represent innumeracy on the level of saying that the probability of a tied election is 10^-90 (which is off by a factor of 10^83), and it’s not innumeracy on the level of that TV commenter and newspaper editor who said that Mike Bloomberg spent a million dollars on each voter (off by a factor of 10^6), but it’s still wrong.

Just to get a baseline here: There were 146,000 births in Ontario last year. 146,000/12 = 12,000 (approximately). So, just from pure chance, we’d expect the monthly proportion of girl births to vary with a standard deviation of 0.5/sqrt(12000) = 0.005. For example if the baseline rate is 48.5% girls, it could jump to 48.0% or 49.0% from month to month. The paper in question reports sex ratio, which is (1-p)/p, so 0.480, 0.498, 0.490 convert to sex ratios of 1.08, 1.06, and 1.04. Or, if you want to do +/-2 standard deviations, you’d expect to see sex ratios varying from roughly 1.10 to 1.02, which is indeed what we see in the top figure above. (The lower figures are each based on less data so of course they’re more variable.) Any real effects on sex ratio will be tiny compared to this variation in the data (see here for discussion of this general point).

In short: this study was dead on arrival. But the authors fooled themselves, and the reviewers, with a blizzard of p-values. As low as 0.002!

So, let me repeat:

– Just cos you have a statistically significant comparison, that doesn’t necessarily mean you’ve discovered anything at all about the world.

– Just cos you have causal identification and a statistically significant comparison, that doesn’t necessarily mean you’ve discovered anything at all about the world.

– Just cos you have honesty, transparency, causal identification, and a statistically significant comparison, that doesn’t necessarily mean you’ve discovered anything at all about the world.

– Just cos you have honesty, transparency, causal identification, a statistically significant comparison, a clear moral purpose, and publication in a top journal, that doesn’t necessarily mean you’ve discovered anything at all about the world.

Sorry, but that’s the way it is. You’d think everyone would’ve learned this—it’s been nearly a decade since that ESP paper was published—but I guess not. The old ways of thinking are sticky. Sticky sticky sticky.

Again, no special criticism on the authors of this new paper. I assume they’re just doing what they were trained to do, and now what they’re rewarded to do. Don’t hate the player etc.

22 thoughts on ““Figure 1 looks like random variation to me” . . . indeed, so it does. And Figure 2 as well! But statistical significance was found, so this bit of randomness was published in a top journal. Business as usual in the statistical-industrial complex. Still, I’d hope the BMJ could’ve done better.

  1. Did they really predict beforehand that a surprising US presidential election result would lead to a ~3% lower sex ratio 4 months later in liberal leaning areas of Ontario?

    If so then there may be something to whatever model they used.

    • Anon:

      Based on my reading of the sex ratio literature, an effect of 0.03 in probability of girl birth is extremely implausible. An effect of 0.0003, maybe. But a study of this size could not detect such differences. It’s a kangaroo situation.

      • If they had predicted a drop of that magnitude beforehand, the surprisingly large magnitude would offer even more support for their model. I just don’t think they actually predicted any of this beforehand.

        There does seem to be a vague concept that if a stressful event happens in months 4-6 of pregnancy something happens that selectively raises male fetus mortality. So there is some precision on the timing (but 3 months is still 30% of the pregnancy), but none on the magnitude or how that should be a function of the nature/magnitude of the stressor, etc.

        • They could predict that liberal leaning people in Canada will be only half as surprised if he’s reelected so next March there will be a 1.5% dip in sex ratio. Or maybe if the election results are delayed due to mail in voting the dip should occur in April instead of March.

  2. Any study whose value strongly depends on a significant finding should not even be undertaken. Would this study have been accepted anywhere if they found that the election had no appreciable effect on the sex ratio?

    • If the results were not significantly different than predicted they could be interesting, as long as the prediction was precise enough to be otherwise surprising (ie, not very consistent with other proposed explanations).

      • I didn’t read the paper. If they had well-reasoned prediction of meaningful precision, then the article is not as ridiculous as it appears at the first glance. I doubt that but I have no background in baby sex ratio science.

        • Nope, they do not have that. There are some previous papers claiming an increase in male fetus mortality after stressful events but nothing very precise.

          My angle is more that this approach of testing the default null hypothesis does not tell us how well their hypothesis performed. They never test their hypothesis at all.

  3. Andrew wrote:

    “Just cos…”

    Seems like these bullet points would work as an answer to the question posed in the earlier blog: “What are my statistical principles?”

  4. I like this one:

    Just cos you have honesty, transparency, causal identification, a statistically significant comparison, a clear moral purpose, and publication in a top journal, that doesn’t necessarily mean you’ve discovered anything at all about the world.

    • Are they honest though, did they really hypothesize the US election would have an effect on the sex ratio in Canada? Or did they see a dip in the data they looked at then come up with that hypothesis?

  5. so a women was pregnant for 4 months, then the election happen, 5 months later she gives birth. as fare as I understand the sex is determined at at day 1… so unless… there is some advanced biology I am not understanding here… I would expect the change to happen 9+ months after the election.

    • Jan:

      From a quantitative perspective, none of this is happening on the scale that is measurable in this study, so all this is kind of empty talk, but . . . just to clarify, yes, the sex of the embryo is determined at conception, but the sex ratio at birth is also determined by which embryos make it all the way to birth. Lots can happen in the first 10 weeks or so. Not much happening after 4 months, though.

  6. It also strikes me as a rather low quality hypothesis. How much does stress even effect birth ratio? Surely this can be tested by putting pretty extreme pressures on animals.

    Beyond that, the claim that “liberal areas” would have higher stress following the election of Donald Trump as compared to “conservative areas” also seems dubious, especially since they’re looking at Canada, where Trump’s policies have no direct effect. Moreover, even in the US, I could see a reasonable case that the election would be as stressful for conservatives.

    Finally, they’re not even comparing liberal and conservative individuals, they’re comparing areas, most of which probably only have slight conservative or liberal majorities. (to be fair, this last point could suggest a larger effect size than they measured, but it also means that the causal effect could be related to a confounder for regional politics, like urban vs. rural differences).

  7. So much of the discussion of inferior science on this blog comes back to cargo-cult science. The authors were meticulously following the ritual they’d been taught, and the reviewers only checked that the authors’ performance was close enough to that ritual. But I wouldn’t absolve them so easily. A scientist is a professional skeptic, driven by a deep curiosity about the world. Don’t they have a responsibility to point some of that skepticism, some of that curiosity, at their own understanding of science? Instead, they seem satisfied that what they are doing is science just because it’s what they were taught to do, or because it’s what they have incentives to do, or because it’s what their peers do. The unexamined methodology is not worth applying.

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *