I was asked by a reporter to comment on a paper by Satoshi Kanazawa, “Beautiful parents have more daughters,” which is scheduled to appear in the Journal of Theoretical Biology.
As I have already discussed, Kanazawa’s earlier papers (“Engineers have more sons, nurses have more daughters,” “Violent men have more sons,” and so on) had a serious methodological problem in that they controlled for an intermediate outcome (total number of children). But the new paper fixes this problem by looking only at first children (see the footnote on page 7).
Unfortunately, the new paper still has some problems. Physical attractiveness (as judged by the survey interviewers) is measured on a five-point scale, from “very unattractive” to “very attractive.” The main result (from the bottom of page 8 ) is that 44% of the children of surveyed parents in category 5 (“very attractive”) are boys, as compared to 52% of children born to parents from the other four attractiveness categories. With a sample size of about 3000, this difference is statistically significant (2.44 standard errors away from zero). I can’t confirm this calculation because the paper doesn’t give the actual counts, but I’ll assume it was done correctly.
Choice of comparisons
Not to be picky on this, though, but it seems somewhat arbitrary to pick out category 5 and compare it to 1-4. Why not compare 4 and 5 (“attractive” or “very attractive”) to 1-3? Even more natural (from my perspective) would be to run a regression of proportion boys on attractiveness. Using the data in Figure 1 of the paper:
> attractiveness <- c (1, 2, 3, 4, 5)
> percent.boys <- c (50, 56, 50, 53, 44)
> display (lm (percent.boys ~ attractiveness))
lm(formula = percent.boys ~ attractiveness)
coef.est coef.se
(Intercept) 55.10 4.56
attractiveness -1.50 1.37
n = 5, k = 2
residual sd = 4.35, R-Squared = 0.28
So, having a boy child is negatively correlated with attractiveness, but this is not statistically significant. (Weighting by the approximate number of parents in each category, from Figure 2, does not change this result.) It would not be surprising to see a correlation of this magnitude, even if the sex of the child were purely random.
But what about the comparison of category 5 with categories 1-4? Well, again, this is one of many comparisons that could have been made. I see no reason from the theory of sex ratios (admittedly, an area on which I am no expert) to pick out this particular comparison. Given the many comparisons that could be done, it is not such a surprise that one of them is statistically significant at the 5% level.
Measuring attractiveness?
I have little to say about the difficulties of measuring attractiveness except that, according to the paper, interviewers in the survey seem to have assessed the attractiveness of each participant three times over a period of several years. I would recommend using the average of these three judgments as a combined attractiveness measure. General advice is that if there is an effect, it should show up more clearly if the x-variable is measured more precisely. I don’t see a good reason to use just one of the three measures.
Reporting of results
The difference ireported in this study was 44% compared to 52%–you could say that the most attractive parents in the study were 8 percentage points more likely than the others to have girls. Or you could say that they were .08/.52=15% more likely to have girls. But on page 9 of the paper, it says, “very attractive respondents are about 26% less likely to have a son as the first child.” This crept up to 36% in this news article, which was cited by Stephen Dubner on the Freakonomics blog.
Where did the 26% come from? Kanazawa appears to have run a logistic regression of sex of child on an indicator for whether the parent was judged to be very attractive. The logistic regression coefficient was -0.31. Since the probabilities are near 0.5, the right way to interpret the coefficient is to divide it by 4: -0.31/4=-0.08, thus an effect of 8 percentage points (which is what we saw above). For some reason, Kanazawa exponentiated the coefficient: exp(-0.31)=0.74, then took 0.74-1=-0.26 to get a result of 26%. That calculation is inappropriate (unless there is something I’m misunderstanding here). But, of course, once it slipped past the author and the journal’s reviewers, it would be hard for a reporter to pick up on it.
Coauthors have an incentive to catch mistakes
I’m disappointed that Kanazawa couldn’t find a statistician in the Interdisciplinary Institute of Management where he works who could have checked his numbers (and also advised him against the bar graph display in his Figure 1, as well as advised him about multiple hypothesis testing). Just to be clear on this: we all make mistakes, I’m not trying to pick on Kanazawa. I think we can all do better by checking our results with others. Maybe the peer reviewers for the Journal of Theoretical Biology should’ve caught these mistakes, but in my experience there’s no substitute for adding someone on as a coauthor, who then has a real incentive to catch mistakes.
Summary
Kanazawa is looking at some interesting things, and it’s certainly possible that the effects he’s finding are real (in the sense of generalizing to the larger population). But the results could also be reasonably explained by chance. I think a proper reporting of Kanazawa’s findings would be that they are interesting, and compatible with his biological theories, but not statistically confirmed.
My point in discussing this article is not to be a party pooper or to set myself up as some sort of statistical policeman or to discourage innovative work. Having had this example brought to my attention, I was curious enough to follow it up, and then I wanted to share my newfound understanding with others. Also, this is a great example of multiple hypothesis testing for a statistics class.
It would be worth writing a reply to JTB about this: it shouldn't take a lot of time, and we really do make biologists and journal editors aware of the pit-falls in what they're doing.
Oh, the tales I could tell….
Bob
Bob,
Yes, I did this (and sent a copy to Kanazawa too). I hope it will raise some awareness.
Would you mind explaining why you needed to divide the parameter estimate by 4 (or post a reference)? Sadly, based on my statistical training in a social science, I would have made the same error.
1/4 is the derivative of the inverse-logit function evaluated at 0. (Equivalently, 4 is the derivative of the logit function evaluated at 0.5.)
Logistic regression isn't really my thing, but isn't -0.31 the log of the relative odds of one sex v the other between the two groups? Then the odds (not the probability) of getting a boy in the "very attractive" group are 0.74 times those in the other group, or, I suppose, 26% lower. Ignoring your other objections, of course.
Robin,
Yes, that's right. This is not "26% less likely to have a son as the first child."
Robin,
Here's the difference. The odds of having a boy in the "ugly" control group are basically 1:1, or 50%. (1 outcome out of 2)
The logistic regression results of an odds-ratio of 0.74 means that the odds in the "beautiful" group are adjusted to 0.74:1. That's equal to a probability of getting a boy of 0.74/1.74 (that is, 0.74 outcomes out of a total of 1.74 outcomes) = 42%, or an 8% reduction in the probability of a boy.
Sure, I recognize this. I was just suggesting that this seems to be the way the original author got to these numbers.
It seems to me that, while the statements of Kanazawa are incorrect in the usage of the statistical community, the ambiguity of English makes them less surprising in a broader context. I did a 'define:odds' search on Google, and several of the definitions used the word 'likelihood' or 'likely', including the first one, from princeton's wordnet. They preserve the comparison of the probability of the event to that of its complement, but some of the definitions farther down the page do not. In ordinary English, Kanazawa's use is barely even nonstandard. Of course, we're talking about an article in Theoretical Biology, which I would expect to use the statistical definitions.
My apologies. When I took a second look at your post, it hit me that you might have simplified what you were saying on purpose. Seeing as how you mentioned both "odds" and "probability" and all. :^)
If the fault is the use of the phrase "26% more likely" instead of saying a "26% increase in the odds," this is a usage that I agree is incorrect but which is PERVASIVE in epidemiology. It's a fight I've given up trying to fight there. One could probably find hundreds/thousands of examples from a text search of "logistic regression" AND ("more likely" OR "less likely").
Dr. Kanazawa is quite a character, isn't he ? I think his papers are great because they are excellent material for problem-sets of the spot-and-describe-the-error kind.
For example, I just read this paper of his claiming that beautiful people are indeed more intelligent. Initially, I thought it was tongue-in-cheek, but after I realized he was serious, it took me a few minutes of thought to clearly identify what I think is the fundamental problem with his "theorem". (Btw, it is clear from the Discussion that he takes the "theorem" status of his argument quite seriously.)
So, if male intelligence is correlated with male status, and higher-status in men is correlated with beauty in their wives, it certainly does not necessarily follow that male intelligence is correlated with spousal beauty. This would be the case, for example, if higher-status in men is the result of many factors (intelligence being one of them), and if the correlation between male status and spousal beauty results from a correlation between one of the other factors resulting in higher-status and spousal beauty (like social class, for instance).
Am I right ?
I fail to see why he doesn't simply attempt to demonstrate a correlation of some test measure of intelligence with some independently obtained rating measure of beauty in the same individual. Or at least: show that beautiful women were more likely to marry intelligent men, and then hope that genetics will take care of the rest of the argument. Wouldn't this be the direct way of looking at this issue ?
I am but an amateur statistician.