Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes:
Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved.
Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote!
My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect when the true effect is negative, or claiming a large effect when the true effect is small, or claiming a precise estimate of an effect when the true effect is highly variable, or . . . I’ve probably missed a few possibilities here but you get the idea.
In addition, none of Carl’s correspondents mentioned the “statistical significance filter”: the idea that, to make the cut of statistical significance, an estimate has to reach some threshold. As a result of this selection bias, statistically significant estimates tend to be overestimates–whether or not a Bayesian method is used, and whether or not there are any problems with fishing through the data.
Bayesian inference is great–I’ve written a few books on the topic–but, y’know, garbage in, garbage out. If you start with a model of exactly zero effects, that’s what will pop out.
I completely agree with this quote from Susan Ellenberg, reported in the above article:
You have to make a lot of assumptions in order to do any statistical test, and all of those are questionable.
And being Bayesian doesn’t get around that problem. Not at all.
P.S. Steve Stigler is quoted as saying, “I don’t think in science we generally sanction the unequivocal acceptance of significance tests.” Unfortunately, I have no idea what he means here, given the two completely opposite meanings of the word “sanction” (see the P.S. here.)
P.P.S. Mark Liberman informs me that “sanction” is an example of an auto-antonym. The linked wikipedia page gives several other examples, including “dust” and “oversight.” We could also add the expression that in polite company is called “effing A,” as it represents a strong emotion but can be strongly positive or strongly negative.