A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year.
Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct.
Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic.
Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem.
Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Given the incentives something silly is bound to happen. At issue is cause or effect.
At this point I was going to respond in the comments, but I decided to make this a separate post (at the cost of pre-empting yet another scheduled item on the queue), for two reasons:
1. I’m pretty sure that a lot fewer people read the comments than read the posts; and
2. I thought of this great title (see above) and I wanted to use it.
First let’s get Bayes out of the way
Just to start with, none of this is a Bayes vs. non-Bayes battle. I hate those battles, partly because we sometimes end up with the sort of the-enemy-of-my-enemy-is-my-friend sort of reasoning that leads smart, skeptical people who should know better to defend all sorts of bad practices with p-values, just because they (the smart skeptics) are wary of overarching Bayesian arguments. I think Bayesian methods are great, don’t get me wrong, but the discussion here has little to do with Bayes. Null hypothesis significance testing can be done in a non-Bayesian way (of course, just see all sorts of theoretical-statistics textbooks) but some Bayesians like to do it too, using Bayes factors and all the rest of that crap to decide whether to accept models of the theta=0 variety. Do it using p-values or Bayes factors, either way it’s significance testing with the goal of rejecting models.
The Notorious N.H.S.T. as an enabler
I agree with the now-conventional wisdom expressed by the original commenter, that null hypothesis significance testing is generally inappropriate. But I also agree with Fernando’s comment that the pressures of publication would be leading to the aggressive dissemination of noise, in any case. What I think is that the notorious N.H.S.T. is part of the problem. It’s a mechanism by which noise can be spread. This relates to my recent discussion with Steven Pinker (not published on blog yet, it’s on the queue, you’ll see it in a month or so).
To say it another way, the reason why I go on and on about multiple comparisons is not that I think it’s so important to get correct p-values, but rather that these p-values are being used as the statistical justification for otherwise laughable claims.
I agree with Fernando that, if it wasn’t N.H.S.T., some other tool would be used to give the stamp of approval on data-based speculations. But null hypothesis testing is what’s being used now, so I think it’s important to continue to point out the confusion between research hypotheses and statistical hypotheses, and the fallacy of, as the commenter put it, “disproving always false null hypotheses and taking this disproof as near proof that their theory is correct.”
P.S. “The aggressive dissemination of noise” . . . I like that.