The tabloid in question is the journal Nature, which along with Science and PPNAS (the Proceedings of the National Academy of Sciences, publisher of gems such as the himmicanes and hurricanes study) has in recent years become notorious for publishing flashy but unsubstantiated scientific claims.
As Lord Acton never said, publicity corrupts, and absolute publicity corrupts absolutely. We now have a vicious cycle where journalists rely on Nature/Science/PPNAS for science news that is both exciting and respectable (these are considered top journals, after all), where scientists send their best and most exciting work to these journals (we all want publicity for our best work), and where Nature/Science/PPNAS thrive on the publicity.
The usual story is that we notice flaws in published papers that had managed to get through the scientific review process.
Today’s story is slightly different in that it is a report of a conference presentation for which there is no publication, indeed not even a preprint that we could look at.
I learned about this one from statistician Thomas Lumley, who links to a news article by Sara Reardon in Nature with subheadline, “Twin study reveals five DNA markers that are associated with sexual orientation.” Reardon reports:
Researchers collected DNA samples in saliva from 37 pairs of identical twins in which only one twin was gay, and 10 pairs in which both were gay. By scanning the twins’ epigenomes, the researchers found five epi-marks that were more common among the gay men than in their genetically identical straight brothers. An algorithm they developed based on the five epi-marks could correctly predict the sexual orientation of men in the study 67% of the time. UCLA computational geneticist Tuck Ngun will present the work on 8 October at the American Society of Human Genetics meeting in Baltimore, Maryland.
There are some problems here. First, as Lumley points out:
70% accuracy doesn’t seem all that impressive. Using the usual figures on the proportion of men who are gay, the approach of assuming everyone is straight unless you are told otherwise is better than 90% accurate, and doesn’t need expensive genetics. Presumably they mean something different by 70% accuracy, but we don’t know what.
Indeed, it’s hard know exactly what they did. Here’s a guess. The study in question seems to have 57 gay men and 37 straight men. So in this case the optimal rule is to always guess gay, which would be correct 61% of the time. So a rule that is correct 67% of the time in this sample is not very impressive, especially given that they got to choose the rule based on their data!
And that brings me to my second concern, which is statistical. The report says that researchers found 5 epi-marks based on a sample of size 47. Seems like a serious selection problem here, maybe not so hard to find 5 things that fit your data even if everything were pure noise.
In her news article, Reardon does write:
Associations found in small studies are prone to evaporate when tested in larger groups.
It’s great to put this warning in, but this is one sentence buried 16 sentences deep within her article, and it doesn’t really do much to counteract the impression given by the generally positive and unquestioning tone of the article.
I was curious what other publicity this study has received so I did a search on Google News. 148 articles were listed, including in respected publications such as the Telegraph and Guardian of London, and the Los Angeles Times.
One of the better reports is by Virginia Hughes of BuzzFeed. The headline there is, “Epigenetic Test Can Predict Homosexuality, Controversial Study Claims.” Well put.
And, in the Atlantic, Ed Yong’s report is headlined, “No, Scientists Have Not Found the ‘Gay Gene’: The media is hyping a study that doesn’t do what it says it does.” Yong mentions the overfitting concern noted above and provides some additional details:
As far as could be judged from the unpublished results presented in the talk, the team used their training set to build several models for classifying their twins, and eventually chose the one with the greatest accuracy when applied to the testing set.
I don’t know how Yong got this information—perhaps someone leaked a preprint of the research paper to him? But, in any case, yeah, this selection problem is a big deal.
To put it another way, why should we believe these headlines? Because someone from a respected university gave a conference talk on it? That’s not enough: conference talks are full of speculative research efforts. Because it was featured in a news article in Nature? No.
As I see it, the problem is not with the research itself—I disagree with Yong’s statement that perhaps the best option with this research is “to not do it at all”—but with its presentation as truth, or even as provisional truth. Speculation is fine, just label it as such.
P.S. Reardon also refers unquestioningly to the claim that “the chance of a man being gay increases by 33% for each older brother he has,” but the link here is to a 2001 paper on the topic by Ray Blanchard, and last time I looked at this work, back in 2006, I had some concerns about this claim (also see comments on that post for some interesting discussion).
P.P.S. More here. Researcher Tuck Ngun defends his twin study and I remain skeptical.