Studying sex ratios is just a lot harder than you think: effects are tiny and variation is large.

A science journalist sent me an email:

Is this your sort of paper? It’s on sex ratios at birth.

Seems fascinating to me but I need the thoughts of a statistician. If you have time and if this is your kind of hating, I’d love to talk to you. It’s embargoed in Science Advances for Thursday.

This all happened awhile ago so the paper is no longer embargoed, hence the link above.

Here’s how I responded to the journalist:

I’m not sure about the details of this analysis but I’m skeptical. For one thing, references 8, 9, 10, 11, and 12, which they cite and which purportedly give evidence for systematic variation of sex ratios, have small enough sample sizes that their results are essentially pure noise (we wrote a paper about this a few years ago) so it is already a bad sign that the authors cite these papers uncritically. They do say, “none of these hypotheses have been confirmed in large epidemiological studies,” but that understates it, in that those papers should never have been taken seriously in the first place.

That said, there is some evidence from population studies that sex ratios vary a little bit. I can’t remember all the details, but I think that the probability of girl birth is about 0.5 percentage points higher for African-American parents than for white parents, also the probability of girl birth is slightly higher for older mothers.

I’ll just say a few things:

1. For some reason, people looove to study sex ratios and they’re always trying to see if they can predict if a baby will be a boy or a girl.

2. The process is essentially random, i.e., it’s unpredictable. The average probability of girl birth is approximately 48.8%. Even if it’s 49.3% for some mothers and 48.3% for others, it’s still almost entirely random.

3. It’s easy to find patterns in random data, as evidenced by references 8, 9, 10, 11, and 12 and also discussed in this classic 2011 paper by Simmons et al.

4. Differences in sex ratio are small enough that you need huge sample sizes to detect any signal amidst all the noise, and even small data problems will overwhelm any signal.

5. I flat-out don’t believe their claim that if you have three boys, that there’s a 61% chance your next baby will be a boy. I could be wrong, but based on my experience with the literature, I just don’t believe it. I don’t know whether this is coming from selection bias or what.

The journalist responded:

I meant to say if this is your kind of thing. But I guess I must have been anticipating hating because that’s what I wrote.

If you remain interested in the statistical challenges of estimating variation in sex ratios, I recommend my 2009 article with David Weakliem and this 2001 post in Chance News by the late Laurie Snell.

Studying sex ratios is just a lot harder than you think: effects are tiny and variation is large.

From polling and clinical trials and psychology experiments and other things, we’re used to the idea that a few thousand people is a large sample. But a few thousand is not a large sample if you’re estimating tiny effects. Differences in Pr(girl) of 0.1 percentage point or even 0.5 percentage point are small compared to the effects that we study in medicine, politics, and psychology, and a sample size that’s sufficient to study effect sizes of 5 percentage points or 10 percentage points or more won’t be enough here.

The trouble–and we see it over and over again–is that what is reported are statistically significant results, which by necessity will be large. So people do these sex-ratio studies, they find some huge effects. The effects are “statistically significant” and “practically significant.”

What could possibly go wrong? The answer is that (a) it’s not hard to find statistically significant results from pure noise, (b) the signal here is small enough that it can be overwhelmed by biases in data collection, and (c) with such noisy studies, anything that’s statistically significant will automatically be practically significant–indeed, it will be a drastic overestimate of any real effect.

The result is a series of apparently bulletproof statistical results implying large results, leading to an entire subfield full of noise (as in references 8, 9, 10, 11, and 12 of the above-linked paper). It’s a vicious circle, a perpetual motion machine of apparent success. And the claimed huge effects get amplified in the news media (yeah, I’m looking at you, Freakonomics!), motivating more studies along these lines.

On the plus side, the journalist contacted me first, which injected a note of skepticism into the proceedings. On the minus side, there’s a selection bias by which the credulous media outlets (yeah, I’m looking at you, BBC!) promote the claim uncritically, while the savvier journalists know to lay off it. The net result is that the coverage that does come out, is uniformly unskeptical, or nearly so.

To return to the example at hand: if the sex ratio of humans varied like the sex ratios of Seychelles warblers, then these studies would be just fine. The problem is the effect size. Population studies find tiny effect sizes–not zero, but less than 0.5 percentage points in most settings. Studies based on surveys find huge effect sizes, which is what we’d expect from noisy estimates.

I also don’t see any substantive theory behind the claims, other than that it seems very intuitive to people that (1) the sex of a baby should be predictable and (2) sex ratios should vary a lot.

I think the basis for that first intuition is gender essentialism: men and women are so clearly different that it just stands to reason that they should somehow be externally distinguishable (in the pre-ultrasound era) even in the womb, and that the sex of the baby should be influenced by various ways as might be explained by gender essentialism.

I think the basis for the second intuition is what Tversky and Kahneman called “the law of small numbers”: everybody knows someone with several siblings, all of the same sex. People don’t have a sense of the huge sample size that would be needed to learn anything useful from such data, even setting aside selection and recall issues. Also, gender essentialism: it just stands to reason that parents of boys should be more masculine, in some way, than parents of girls. This bit of intuition can be backed up by endless speculation of the sort that is called evolutionary psychology and which plays well in Freakonomicsland.

Regarding the particular article under discussion. I can’t say for sure they’ve got nothing there. I can just say that I don’t believe it, because their paper falls in a long line of research finding large and statistically significant effects from noisy sex-ratio data. Also it’s a bad sign that they cited several papers from the literature that are definitely full of crap.

I agree with the authors that variation in Pr(girl) is possible–indeed, the variation can’t be exactly zero, as we know that the probability varies by ethnicity, maternal age, and other conditions. I’ve just never seen any evidence that the natural variation would be large enough to be detectable from this sort of study, and I’ve seen lots of this sort of study that present such claims with a great deal of unwarranted confidence.

P.S. Here’s another example.

17 thoughts on “Studying sex ratios is just a lot harder than you think: effects are tiny and variation is large.

  1. I’m absolutely shocked that they found that mother’s hair color was not associated with baby’s sex. Of course, even that “finding” is wrong – the absence of a statistically significant effect is not evidence of a lack of effect. But why quibble about details. BMI did not have any association with baby’s sex either, which should make Daniel Lakeland happy (since he has often demonstrated the deficiencies of BMI as a meaningful measure).

    • Adiposity, and waist to height, my favored measures, would have very strong correlations with baby sex, at least by say age 18.

      Biological girls have higher percentage body fat than boys at maturity. It’s just an observed fact about the population with a known mechanism.

      Adiposity is

      (g * height * rho_w * waist^2) / (4 * pi * weight)

      It’s the dimensionless ratio of the density of water to the density of a cylinder with the same mass as the person, same circumference as the waist, and same height as the person.

      • Hmm. Maybe there isn’t data going back to WWI? Say, 1900 onwards? I’d be willing to bet that the sex ratio tilts towards more boys immediately after major wars. I’d also be willing to bet that no one has the slightest idea why that happens.

        • Adrian:

          A quick google turns up this paper, “Historical trends in the sex ratio at birth,” by Anouch Chahnazarian, from what looks to be 1990, with graphs of recorded sex ratios going back to 1915 and earlier for various countries. There is no consistent pattern of sex ratios tilting toward more boys immediately after major wars.

          Different things happen in different countries, and there is a concern about data quality. In Chahnazarian’s words, “Differential reporting of live births by sex is one common cause of distortions of the SRB [sex ratio at birth]. Female births are more likely to be underreported than male births. Such underreportin would tend to raise the observed SRB.” And indeed we see that in the graphs in the paper.

          Also: “Another source of distortion lies in the underreporting of a live birth rapidly followed by the death of the child. Since neonatality mortality rates are higher for boys than for girls, it is often believed that such underreporting could lead to relatively higher underreporting of male live births . . . However, in some societies, it is likely that the underreporting of female babies who die soon after birth is higher than that of male babies who die soon after birth. . . . Sex selective infanticide is more likely to affect girls than boys . . . Misclassification of births as either live births or stillbirths may also be a source of error.”

          The paper is interesting al the way through. Chahnazarian does write, “A behavioral factor, coital frequency, is although thought to determine the SRB, through the relationship between the timing of intercourse during the menstrual cycle and the probability of a male conception. . . High coital frequency has been invoked to explain the rising SRB during and after wars. . . .” But I don’t really see that in the data.

          My go-to explanation for variations in the probability of boy or girl birth is that male fetuses are weaker than female fetuses, so when times get tough in there (maternal age, poverty, famine, disease), the relative proportion of boy births will decline, and when things get easier (younger mother, better living conditions, less disease), the relative proportion of boy births will increase. I’m not saying that coital frequency and other such factors have no effect–I haven’t looked at these studies in detail–; I just think that those sorts of explanations come to mind more easily to people, as compared to the more indirect but ultimately more plausible (to me) explanations based on fetal conditions.

  2. As I have learnt here from posts such as this, “significant” results can pop out of noise now and then, even with the best intentions, despite weak data. It would be nice to have a model that, instead, reflects the uncertainty in small samples and tells you back that you just can’t know from such data.
    Maybe Bayesian models would help, or regularisation. I’m not so sure how I’d like to be though.

    • that’s the entire playbook of statistics. Whether one use Bayesian or frequentist methods, one can readily get an answer to whether there is sufficient evidence. The practical problem though is that many people won’t take “no”, in this case “not enough evidence,” as the answer. Hence, they invent such concepts as “directional” evidence.

  3. Lets say this sex ratio stuff becomes very important for some reason. Aliens will give antigravity if we can give the exact sex ratios for various conditions.

    Will the ensuing Manhattan Project-level work rely on or benefit from these current studies? Or will we take an entirely different approach requiring different kinds of data, ie involving quantitatively modelling egg/sperm/hormonal activity, viability, etc?

  4. Hi Andrew; The problems you have often highlighted about the human sex ratio, (and the questions about its variation, covariates and evolutionary models for such) have been the norm for many decades. Your sophisticated view of the statistical inference adds some new wrinkles to the inferences.
    My 1982 book THE THEORY OF SEX ALLOCATION has a section on human sex ratio quite in line with what you write. Even birds and mammals gave little evidence of adaptive sex ratio modification… way back in those dark ages.
    Sex ratio evolution and its extension to hermaphrodites is the most empirically successful example of Evolutionary Game Theory (aka ESS models, evolutionary genetic models, etc.). We have zillions of real tests, and many with very big sex ratio shifts, strongly related to environmental ‘events’ with great effects on individual fitness. I attach one classic study here:https://digitalrepository.unm.edu/biol_fsp/24/

    Anyone interested in sex ratio and its evolution/control should consult my book or Stu West’s book, both Princeton Monographs in Population Biology.

  5. Just for fun, here is a recent overview paper on the interaction of sex ratio, environmental effects and sex determining systems [ of which there are many beyond the simple XX,XY]. It is a fascinating area study at the interface of physiology, genetics, population biology, evolution; and natural selection on sex ratio sits at the center.

    https://georges.biomatix.org/storage/app/uploads/public/60c/e7c/320/60ce7c3200e13162292219.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *