Nooooooooooooo (another hopeless sex ratio study)

Two people pointed me to this one:

In Malta, during the study period, there were 8914 live births (M/F 0.5139; 95% CI 0.514–0.524). There was a significant dip (χ2 = 5.1, p = 0.02) of the M/F ratio 4 months after DCG’s assassination to 0.4529. This translates to an estimated loss of 21 male live births.

In Ireland, during the study period, there were 102,368 live births (M/F = 0.5162; 95% CI 0.513–0.519). There was a significant dip (χ2 = 4.5, p = 0.03) of the M/F ratio 4 months after VG’s assassination to 0.5. This translates to an estimated loss of 72 male live births.

I have no problem with people reporting raw data and performing their statistical analyses, but I have a big big problem with this conclusion of the article:

Assassinations of investigative journalists may have important population mental health implications.

The trouble is that actual differences Pr(girl) are very small, and observed differences are very noisy compared to actual differences.

Here’s a baseline. A 0.5 percentage point change in Pr(girl) is about the biggest you’ll ever get, outside of sex selection or major famines. The larger of the two analyses in this paper takes 102,000 births over two years and analyzes the monthly data. That’s 102000/24 = 4250 births per month. So the sampling standard deviation of Pr(girl) in a month will be sqrt((1/2)*(1/2)/4250) = 0.008, which is bigger than any real change you could expect to see.

Now at this point you might say: Why not put together a larger dataset over more years with data from more countries to get a big enough sample size? And my response would be: Sure, do it if you want, but don’t expect anything, as I think any real effects would be in the range of 0.1 percentage point or 0.01 or something like that. It’s just not plausible that this one piece of news would have such a big effect on behavior. Also, there’s other news happening at the same time: the performance of the national economy, releases of influential movies and pop songs, wars breaking out all over the world, refugee crises, economic shocks, etc., all sorts of local and national news. The whole thing makes no sense.

But . . . how did they get the statistical significance, the p-values of 0.02 and 0.03? That’s easy. They looked at 24 months and found the biggest change. Forking paths.

Let me emphasize that “forking paths” is not my main criticism of this study. My main criticism is that it’s hopeless, they’re just using too noisy a measurement. The reason to bring up forking paths or p-hacking is to close the loop, to explain why what is offered as evidence is not good evidence at all.

I agree with the author of the above-linked article that the murder of investigative journalists is a horrible thing. I just don’t think anything is gained by trying to study the topic by linking it to a dataset of what are essentially random numbers.

Again, the trouble with using the sex ratio as an outcome is that (a) it has close to zero variation, and (b) you’re looking at random binary outcomes so the data are really noisy.

P.S. I think that, beyond basic statistical misunderstandings, there are a few more things going on here:

1. Social scientists are always looking for something to publish about, and journalists are always looking for material. So all sorts of crap gets out there, some of which makes it way all the way up the NPR/Gladwell/Ted ladder.

2. There’s an attitude—something that might be called a statistical ideology—that causal identification plus statistical significance equals discovery. This ideology is so strong that can overwhelm people’s critical faculties. Remember Daniel Kahneman’s notorious statement, “You have no choice but to accept that the major conclusions of these studies are true.” Kahneman has since retracted that statement, and good for him—my point here is that the ideology was so strong that it was leading him into the inferential equivalent of stepping off a cliff because he believed that he had wings. I mention Kahneman here not because he had the worst case of it! It’s the opposite: he’d published influential research on researchers’ misunderstanding of statistics; if he can be blinded by ideology or received wisdom in this way, anyone can.

3. Sex-ratio stories in particular seem to have an appeal to people, I guess having to do with very basic gender essentialism, what I’ve called schoolyard evolutionary biology. The differences between boys and girls are so evident that it just seems to make sense to many people to suppose that sex ratios are correlated with so many things (as in the notorious claim promoted by Freakonomics that “good-looking parents are 36% more likely to have a baby daughter as their first child than a baby son”)

26 thoughts on “Nooooooooooooo (another hopeless sex ratio study)

  1. I’m just trying to distill the basic problem of trying to relate birth ratio to particular historical events (aside from, as you say, large-scale population disruptions like wars, famine, etc.). Ultimately, it seems like it boils down to misunderstanding the null hypothesis.

    1) Many potential sources of variability beyond the one chosen. More things are different from month to month than just the event chosen by the researchers, so even if there is a “real” difference, you can’t know why it was there.
    2) Inappropriate scale of replication. Researchers see large N’s (120,000 is a lot!) and assume this gives them small standard errors. But, as you point out by breaking it down month-by-month, the individual birth is not the scale of replication. It is the ratio calculated for the entire month. So really their “sample size” is 24, not 120,000.

    I bring this up because I worry that the two reasons at the end of the post (near-zero variation and noisiness of outcomes) may be opaque to the people working on sex ratios. First, the claim of near-zero variation is empirical; can’t the researchers point to their scatterplots and say, “but clearly there is variation!” Sure they could, but the problem is that the variation is either unsystematic or affected by so many factors that any single factor will not help explain it. Second, the scale-of-replication may help illustrate why people think they have lots of power when, as you say, the scale of the noise is considerably larger than most would assume based on naively eye-balling the sample size.

  2. The article is short but some quick Googling indicates the timing of a “dip” (in quotes because the existence of one itself is what you’re debating) is relevant for testing the male conception vs. spontaneous abortion hypotheses for why gestational stressors could affect the sex ratio. Indeed, they did just do chi-squared tests to see if *any* of the months were inconsistent with the others. It would have been a better theoretical test to test months that were a priori prescribed by those competing hypotheses (e.g., 3 or 7 vs. normal, per some statement in a 9/11 sex ratio study).

    But I wouldn’t exactly call this forking paths or p-hacking. It’s definitely a shotgun approach, and correction for multiple comparisons would help address that problem (though at what point is testing competing theories a shotgun? testing two theories vs. each other is probably not, but would testing four be? does it even matter? how would previous theories [presumably developed in more clinical domains] establish 3 and 7 months as the prime candidates for gestational stress mechanisms without comparing them to other months?). Assuming a charitable interpretation of what they did, they didn’t really fork here, other than in choosing one whole analytical approach as opposed to one we might prefer (not for statistical reasons really, just for theoretical ones); they didn’t do one thing, then based on that, reasonably another thing, and then another, and wind up down a path that actually has a lot of hidden dependencies. And they were clear that they made all comparisons. [Of course, the analysis also controls for nothing that could concomitantly vary, so there are identification concerns here beyond any allegations of questionable research practices. And who knows if the assumption of good intentions is the right one…]

    That said, it’s not your main criticism anyway, and your main criticism that the sampling error is so large as to render testing this whole question without enormous N meaningless is appealing at first blush. [typos: should be “102,000 births over two years” and “sqrt((1/2)*(1/2)/4200) = 0.007”] The natural variation in monthly sex ratio sample proportions is so high that to detect anything significantly different from the norm would require an implausibly large effect (1.4 pp difference!). On the other hand, a sampling SD (i.e., SE) of .007 implies a 4-sigma range of 2.8 pp, so maybe that “implausible” deviation from the mean isn’t so implausible after all (and why their p-value is pretty close to just-significant). Perhaps the p-value has indeed done its job to help summarize the pattern in Fig. 2 as offering not a lot of evidence against the null; with a weak prior on the null, you update away from the null and feel pretty good about it; with a strong prior on the null, you should still update away from the null, but you might not feel very confident about anything in particular at the end.

    • LOL. I skimmed right over that. We need a “When I’m Sixty-Four”-style paper in which subjects are exposed either to “The Boys Are Back in Town” or “Waiting for a Girl Like You” in utero.

    • Presumably if such an effect existed, it would be caused by spontaneous abortion. The timing isn’t really the issue. If you had a famine it could certainly affect the viability of already conceived fetuses, or hormone levels in the mother from say male fetuses might help maintain pregnancy.

      • I believe I’ve read that the timing of the intercourse with respect to the ovulation plays a role in the gender probability, seems there’s a difference between “male” and “female” spermatozoa. So in times where men are too stressed to have sex often (e.g. war), the gender ratio can change.

        I’ve never heard of a mechanism that makes “natural” miscarriages gender-specific.

  3. I wasted a minute looking at this awful, awful paper. Is there no one in the path from researching to writing to publishing who points out how idiotic this is? Anyway, to save others the trouble, I copied and posted Figure 2:
    https://imgur.com/vdN4JHs
    It is absolutely stunning that anyone could look at this and think that it isn’t just noise. Perhaps more constructively, I’m thinking of writing a paper that Veronica Guerin’s assassination caused a large *rise* in the M/F ratio in the month that the assassination happened, which is just as scientifically valid based on the data (Fig. 2). In fact, this is an even more important finding since many of those births occurred in the days before her death, indicating a means of predicting the future.

  4. I’m convinced this is an amazing parody.

    The author’s previous publications are mundane pediatrics and crossword puzzles. I can’t fathom what would motivate someone to first even notice a slightly below average monthly birth ratio and second make the logical jump to assassinations four (!!!) months earlier.

  5. Was there an established theoretical literature explaining why trauma leads to changes in sex ratios? That’d be the bare minimum for passing the laugh test as far as I’m concerned.

    • Matt:

      There is both theory and data suggesting that trauma will be associated with an increase in the proportion of girl births. The problem is that (a) the scale of any such effect is too small to be detectable here, and (b) the event considered in that analysis is just one of many traumas of that sort that are happening.

      • “trauma will be associated with an increase in the proportion of girl births.”

        So interesting! Is there a “trauma scale” that predicts the proportion of girl births? If the trauma scale goes negative, do male births exceed female births? Perhaps a regression discontinuity study is appropriate, with the discontinuity at zero trauma. Seems like we should be able to test this quite easily: there was widespread rejoicing and optimism after Obama was elected, and trauma and despair after Trump was elected. Wait!! that’s two regression discontinuity studies! the trend of the ratio should be sharply discontinuous at exactly four months(!) after each election, with downward spike and falling trend post-2008 election (male births spike then continue to rise with the initial Obamoptimism), and an upward spike and rising trend (female births spike and rise with increasing TrumpTrauma) after 2016!

        Wow. I should be a social scientist. But maybe I don’t have to be an anything to do Ted Talks and get on NPR!

        • Jim:

          The cleanest data on trauma and sex ratio come from the famine in the Netherlands in 1944, after which more girls were born than boys. This is a very rare historical example where the proportion of girl births was greater than 50%. And it’s a country with a big short-term spike in trauma and high-quality birth data.

          No, I don’t think that the purported trauma (among conservatives) of Obama winning or the purported trauma (among liberals) of Trump winning would have any noticeable effects on the sex ratio. That indeed would fall into the range of silly social science.

        • Incidentally, I spent much of the last four years getting caught up on what’s going on in statistics and data analysis. While I didn’t quite get to Bayesian Hierarchical Modelling, I got close enough to draw my conclusions on the field in general. I concluded that modelling and statistical analysis is excellent for solving problems – but only under the condition that the results are **constantly** checked against reality. This condition excludes much – most? – of academic social science, which – like this study – doesn’t even analyze reality, let alone compare results to it.

          The real tragedy in all of this is that this abuse of science is harmful. Society has real problems. It deserves real science to solve them.

        • Well, when I read your earlier post, I thought it was probably meant in jest. But I wasn’t sure. And that’s the problem: so much of what is out there is so off-track that it becomes a parody of itself. That makes it hard to know when remarks of that nature are meant seriously and when they aren’t.

          And yes, this abuse of science is very harmful.

        • I thought the thing about TED talks would give it away! :) It’s tough on line though. Cowen at MR makes little jokes all the time that people miss.

        • jim –

          Not to be an apologist for abuse of science…

          But I always wonder when I read such gloomy reactions to the state of science, whether they’re grounded in the full context.

          As much scientific abuse as there is, and I’m not doubting there’s serious harm in an opportunity cost kind of a way, but if you had to guess – do you think that in balance the abuse of science has a larger negative impact than the positive impact from the non-abusive science?

        • “such gloomy reactions to the state of science”

          My comments don’t encompass all of science. What concerns me is work associated with statistical analysis and forecasting especially but not restricted to social sciences. It’s hard to exactly put a fence around it but I’m not concerned about the state of material science research, which is firmly based in repeatable experiments.

    • Paul:

      To answer your question, see P.S. in the above post. Beyond the careerist motivations of authors, I think that journals publish this sort of thing because it fits a folk biology of gender essentialism. When a paper makes a claim that people want to believe, they won’t scrutinize it so carefully.

      Also, to return to the careerist motivations of authors: We don’t know how many (if any) journals rejected this manuscript before it was finally published in Elsevier’s Early Human Development journal. Maybe the usual suspects (PNAS, Lancet, etc.) said no. All that a paper needs to be published is for some journal somewhere to say yes.

  6. There were still more males born over the year. Is there never supposed to be a month with that reversed? I saw you mentioned a need for more data. I assume you included this idea in that.

    • Jo:

      The usual sex ratio is something like 48.8% girls. If it were something that consistently changed it to 49.8% girls, that would be notable. Realistically, there’s no way to think that news events would have anything like that effect. People often seem to have an intuition that the human sex ratio is very malleable, but the evidence says it’s not.

Leave a Reply

Your email address will not be published. Required fields are marked *