I bear some of the blame for this.
When I heard about John Ioannidis’s paper, “Why Most Published Research Findings Are False,” I thought it was cool. Ioannidis was on the same side as me, and Uri Simonsohn, and Greg Francis, and Paul Meehl, in the replication debate: he felt that there was a lot of bad work out there, supported by meaningless p-values, and his paper was a demonstration of how this could come to pass, how it was that the seemingly-strong evidence of “p less than .05” wasn’t so strong at all.
I didn’t (and don’t) quite buy Ioannidis’s mathematical framing of the problem, in which published findings map to hypotheses that are “true” or “false.” I don’t buy it for two reasons: First, statistical claims are only loosely linked to scientific hypotheses. What, for example, is the hypothesis of Satoshi Kanazawa? Is it that sex ratios of babies are not identical among all groups? Or that we should believe in “evolutionary psychology”? Or that strong powerful men are more likely to have boys, in all circumstances? Some circumstances? Etc. Similarly with that ovulation-and-clothing paper: is the hypothesis that women are more likely to wear red clothing during their most fertile days? Or during days 6-14 (which are not the most fertile days of the cycle)? Or only on warm days? Etc. The second problem is that the null hypotheses being tested and rejected are typically point nulls—the model of zero difference, which is just about always false. So the alternative hypothesis is just about always true. But the alternative to the null is not what is being specified in the paper. And, as Bargh etc. have demonstrated, the hypothesis can keep shifting. So we go round and round.
Here’s my point. Whether you think the experiments and observational studies of Kanazawa, Bargh, etc., are worth doing, or whether you think they’re a waste of time: either way, I don’t think they’re making claims that can be said to be either “true” or “false.” And I feel the same way about medical studies of the “hormone therapy causes cancer” variety. It could be possible to coerce these claims into specific predictions about measurable quantities, but that’s not what these papers are doing.
I agree that there are true and false statements. For example, “the Stroop effect is real and it’s spectacular” is true. But when you move away from these super-clear examples, it’s tougher. Does power pose have real effects? Sure, everything you do will have some effect. But that’s not quite what Ioannidis was talking about, I guess.
Anyway, I’m still glad that Ioannidis wrote that paper, and I agree with his main point, even if I feel it was awkwardly expressed by being crammed into the true-positive, false-positive framework.
But it’s been 12 years now, and it’s time to move on. Back in 2013, I was not so pleased with Jager and Leek’s paper, “Empirical estimates suggest most published medical research is true.” Studying the statistical properties published scientific claims, that’s great. Doing it in the true-or-false framework, not so much.
I can understand Jager and Leek’s frustration: Ioannidis used this framework to write a much celebrated paper; Jager and Leek do something similar—but with real data!—and get all this skepticism. But I do think we have to move on.
And I feel the same way about this new paper, “Too True to be Bad: When Sets of Studies With Significant and Nonsignificant Findings Are Probably True,” by Daniel Lakens and Alexander Etz, sent to me by Kevin Lewis. I suppose such analyses are helpful for people to build their understanding, but I think the whole true/false thing with social science hypotheses is just pointless. These people are working within an old-fashioned paradigm, and I wish they’d take the lead from my 2014 paper with Carlin on Type M and S errors. I suspect that I would agree with the recommendations of this paper (as, indeed, I agree with Ioannidis), but at this point I’ve just lost the patience for decoding this sort of argument and reframing it in terms of continuous and varying effects. That said, I expect this paper by Lakens and Etz, like the earlier papers by Ioannidis and Jager/Leek, could be useful, as I recognize that many people are still comfortable working within the outmoded framework of true and false hypotheses.