How should unproven findings be publicized?

A year or so ago I heard about a couple of papers by Satoshi Kanazawa on "Engineers have more sons, nurses have more daughters" and "Beautiful parents have more daughters."  The titles surprised me, because in my acquaintance with such data, I’d seen very little evidence of sex ratios at birth varying much at all, certainly not by 26% as was claimed in one of these papers.  I looked into it and indeed it turned out that the findings could be explained as statistical artifacts–the key errors were, in one of the studies, controlling for intermediate outcomes and, in the other study, reporting only one of multiple potential hypothesis tests.  At the time, I felt that a key weakness of the research was that it did not include collaboration with statisticians, experimental psychologists, or others who are aware of these issues.

I did my duty and wrote a letter which was published in the Journal of Theoretical Biology.  (I also emailed Kanazawa a copy but didn’t hear back from him.)  There things stood until yesterday when I saw in Tyler Cowen’s blog that Kanazawa had written an article in Psychology Today repeating the claim, "Americans who are rated "very attractive" have a 56 percent chance of having a daughter for their first child, compared with 48 percent for everyone else."  And, even more amazingly (to me), Kanazawa is publishing a book called "Why Beautiful People Have More Daughters."  The work has also been publicized in various places, including a positive mention by Stephen Dubner here (and a more mocking mention here).

OK, now to get to my question.  Kanazawa’s conjectures have not been demonstrated statistically.  (For example, the claim about beautiful parents having more daughters was barely statistically significant and was one of many possible comparisons that could’ve been done with those data.)  So it’s a little disturbing to see this as presented as "true, supported by documented scientific evidence."  On the other hand, their claim might be true.  It would be more scientifically appropriate for Kanazawa to present these results as "speculations which are supported by data," but maybe Psychology Today expects a different sort of writing?

I just don’t know how to think about this.  It’s clear to me how journalists, bloggers, and reviewers should react:  the should discuss this work with skepticism.  The trouble is that the papers were published in a reputable journal (J. Theor. Biology), and a journalist/blogger/reviewer who does not happen to see my critique would naturally tend to trust the result.  (I was only tipped off because I’d already read a bit in the area of sex ratios, for no other reason than that I’ve used boy and girl births as a teaching example.  This was essentially a bit of Bayesian reasoning by me, that Kanazawa conclusions didn’t match my priors, leading me to look more carefully at his reasoning.)  So I can’t really blame the editors of Psychology Today, or maybe even the editors at Perigee Books for not knowing any better.

But should I blame Kanazawa?  I don’t want to be dismissive of scientific speculation–I don’t like the idea of statistican as censor–so maybe there would be a way for him to present more of the full statistical story in his book (for example, in the beauty-and-daughters study, a graph with the proportion of girls born to people of all five beauty categories–rather than just comparing categories 1-4 to category 5–along with the beauty assessments from all three waves of the study).  It’s a tough call to decide how to present speculative findings.

14 thoughts on “How should unproven findings be publicized?

  1. In addition to Kanazawa I would also blame the
    review process of scientific journals. I don't
    know the review procedures of JOTB, but I imagine
    it isn't too different from those of journals
    that I'm familiar with. Articles are not
    refereed by a large group of highly qualified
    and interested individuals, they are refereed by the first two or three people who the AE happens
    to talk into providing a review.
    These referees may lack the time, interest and/or
    ability to provide a thorough review of the
    scientific claims of the article.
    Additionally, if the AE recruits reviewers
    from the author's bibliography, it is
    likely that the pool of reviewers will
    not be sufficiently skeptical of the author's

    Is there an alternative? I'm hoping that
    dynamical peer review (e.g.
    will evolve to the point that it can replace
    our current system.

  2. This is most troubling for a journalist (me). Given the intro, I assumed the punch line would be some hazy gray literature thing that had not been peer reviewed.

  3. Personally, I think you're being a little too generous to Kanazawa. Particularly now that you've drawn his attention to problems with his statistical analysis, it seems slightly disingenuous of him to continue making the forceful claims he seems to be making. This is even more so if he knows that he's writing for an audience that may not be able to assess those claims adequately themselves – in such a case I think there's even more of an onus on him to state his claims carefully…

  4. I agree with the previous comment that you may be too generous – especially given that Kanazawa is a member of the methodology institute at LSE and teaches graduate statistics courses in the social sciences.

  5. Andrew, that is one well-written letter. Thank you for linking to it. ^_^

    The "Psychology Today" article is disturbing for more than just beauty/child-sex issue: Just prior to the Iraq war, "Frontline" reported that the most prolific group of suicide bombers were the Tamil Tigers, fighting in Sri Lanka. Tamils, the people who invented the suicide-bomber explosive-vest, are mostly Hindu; approximately 5% are Christian or Muslim (see Wikipedia under "Tamil").

    I don't know if the Tamil Tigers have lost the top spot in the last few years; however, Wikipedia reports an estimated 6.5 million Sri Lankan Tamils. Compared to the number of Muslims from which to draw suicide bombers, it certainly seems unlikely that something about Islam can be blamed for suicide bombers.

    Indeed, the article suggests that suicide bombing is a strategy for success in the mating game. One suspects that logic only really works for swamp dragons on Discworld.

  6. Andrew-

    I applaud your willingness to raise the question. I also agree with a number of the comments that question Kanazawa's integrity on the matter. This type of work results in science and statistics to be dismissed by the general public. All the downside so our friend Kanazawa can become a celebrity. I'm not buying

  7. Andrew,

    Self doubt is a good thing when it comes to addressing the methods as opposed to the actual phenomena at hand. However, as you know in the social science world, there is an enormous societal concern that trumps letting bad results get press/book coverage.


  8. I would like to see reputable journals that focus on critiquing previous work. This would provide a nice forum for people to respond to studies like Kanazawa's and would probably encourage researchers like Kanazawa to use more solid methods. And, people that are interested in or skeptical of articles like Kanazawa's would know where to look for critiques.
    (Similarly, I would like to see a journal in my field that publishes results that confirm null (and less interesting) hypotheses.)

  9. I believe the community has to empirically show that the claims do not replicate

    We recently got lucking in genetics because many of the false positive published claims could quickly be checked by other labs that already had the data in hand … and the false positive claims were made by well respected senior (basic) scientists in journals like Nature (now some of these journals are thinking of delaying publication until claims have been replicated by other groups.)

    As for "reporting only one of multiple potential hypothesis tests." I am aware of a survey of clinical researchers that suggest 50% just don't believe this is problem. Empirical studies that track studies from ethics approval forward are trying to correct this misunderstanding (Altman and Chan).

    "At the time, I felt that a key weakness of the research was that it did not include collaboration with statisticians"

    My prior would be centered at 35% with little mass below 10%. What perecentage of statisticians believe that "reporting only one of multiple potential hypothesis tests." is a problem?

    Unfortunately it is hard to do good empirical studies on questions like this and often they are not needed for those who understand that "2 + 2 = 4" (in an ideal world we would not need them) but they are needed for the wider community that must "step in" on these issues.

    "getting off my soap box now"


  10. There are similar issues in other fields.

    At the recent CERN workshop on Innovations in Scholarly Communication (OAI6) Prof. Alexander Lerchl gave an interesting presentation on the importance of having access to the raw data used in statistical analysis supporting research claims.

    "A group of researchers from the Medical University Vienna published data in 2005 and 2008 which showed that electromagnetic fields from mobile phones severly damaged DNA molecules of human cells. These publications caused intense debates about the safety of mobile phones, and politicians, physicians as well as the general public were extremely concerned. When looking at the data, however, first calculations led to the conclusion that they were "too good to be true" since the standard deviations of the mean values were already lower than the pure stochastic noise of the method. Later it turned out that one person who actually performed the experiments knew the blinding code of the exposure system so that data fabrication was easy. In addition, an electronic document from the group in Vienna, submitted as an abstract for a conference, contained hidden data which also proved that the published data were fabricated. An investigation by the University came to the conclusion that these publications contained fabricated data and should be retracted. So far, this did not happen. These cases highlight the need for deposition of original data when a manuscript is submitted in order to make investigations possible if suspicions about the scientific integrity of submitted or published articles arise."

  11. But if we adjust probabilities in proportion to the number of hypotheses we test, we will never find anything as critical p tends towards 0! :)

    Someone should teach these guys research design.

Comments are closed.