Comments on: I hate Bayes factors (when they’re used for null hypothesis significance testing)

By: Austin Emory Fournier

Austin Emory Fournier — Sat, 14 Sep 2019 04:56:47 +0000

My understanding was that Bayes Factors didn’t take into account the fact that the alternative hypothesis is made of a bunch of different hypotheses, and that the probability of each individual hypothesis rises or falls relative to the others as more data is colleced. Which, if I understand things correctly, changes the probability distribution for future data under the alternative hypothesis as more data is collected. Am I incorrect about this?

By: Corey

Corey — Wed, 11 Sep 2019 17:59:19 +0000

I’m deeply ambivalent about Bayes factors, but I find Simonsohn critique very flawed. He offers two vignettes under the titles “Thing 1” and “Thing 2”. Here’s the conclusion of the “Thing 1” section:

All I have falsified is the irrelevant hypothesis that anchoring for rivers is like anchoring for doors.

All I have falsified is a hypothesis I considered only so that I could run the Bayes factor, one I was never interested in.

This critique is basically, “I used a tool to do a stupid thing, therefore the tool is stupid.” There’s not much to say about this sort of argument beyond pointing out its basic structure.

In the “Thing 2” section we have this line:

If Milton had said “the effect is anywhere between -50% and +50%”, a prediction that is *never* false, the Bayes factor would *always* deem it false, because observed values would always be, on average, unlikely, under ‘the alternative.’

This is clearly wrong, and it shows me that Simonsohn hasn’t carried out the math of Bayes factors (or at least, hasn’t carried it out correctly) and thus doesn’t understand them. Earlier in the post he posits a 1% observed effect; if that effect had been measured very precisely then it would fall well outside of the neighbourhood in which the prior predictive mass under the no-effect hypothesis is found, and then the alternative hypothesis of “some effect between -50% and +50%” would indeed be favoured.

How can we be having a discussion of Bayes factors — or any discussion about statistics — in which we’re talking about an observed effect and not giving consideration to the uncertainty with which the effect is measured? We wouldn’t be talking about p-values in this example without at least mentioning the standard error…

By: Anoneuoid

Anoneuoid — Wed, 11 Sep 2019 13:18:15 +0000

In reply to Phil. Sure, the difference is that those are things predicted by theory rather than the opposite of what is predicted by theory.

By: EJ Wagenmakers

EJ Wagenmakers — Wed, 11 Sep 2019 09:13:26 +0000

So I’m a big fan of Bayes factors (and of Harold Jeffreys’s work in general), and I’ve even managed to get Andrew to co-author papers reporting Bayes factors (albeit with a footnote stating that he hates them :-)). Here’s my brief take, for balance:
1. The issue of the point null hypothesis and its plausibility is really orthogonal to the Bayes factor, as Andrew suggests. The Bayes factor can be used to test *any* two models, as long as they make predictions. If you prefer a normal prior with variance epsilon instead of the point, nothing stops you from using that instead.
2. For discrete parameter spaces, the update from prior to posterior distribution *is* a Bayes factor. Bayes factors are part of Bayes rule; this is why Jack Good termed them “Bayes” factors. See https://www.bayesianspectacles.org/bayes-factors-for-those-who-hate-bayes-factors/
3. To me, in my line of work, Bayes factors seem to address the question that researchers care about: “Is there some signal in this noise, or am I just reading tea leaves?” Harold Jeffreys claimed that (from memory): “variation should be considered random until evidence to the contrary is presented”. I think this is a nice statistical interpretation of what it means to be skeptical, and organized skepticism is an important part of science.
4. A recent paper outlining the philosophy of Jeffreys and its practical implementation is: Ly, A., Stefan, A., van Doorn, J., Dablander, F., van den Bergh, D., Sarafoglou, A., Kucharsky, S., Derks, K., Gronau, Q. F., Raj, A., Boehm, U., van Kesteren, E.-J., Hinne, M., Matzke, D., Marsman, M., & Wagenmakers, E.-J. (in press). The Bayesian methodology of Sir Harold Jeffreys as a practical alternative to the p-value hypothesis test. Computational Brain & Behavior. Preprint: https://psyarxiv.com/dhb7x

Cheers,
E.J.

By: Jorge N. Tendeiro

Jorge N. Tendeiro — Wed, 11 Sep 2019 08:54:32 +0000

For a recent overview on Bayes factors, there’s this paper by myself and Henk Kiers:
https://www.ncbi.nlm.nih.gov/pubmed/31094544

Preprint, appendices, and R code freely available at https://osf.io/jmwk6/

By: Phil

Phil — Wed, 11 Sep 2019 05:37:09 +0000

I definitely do not want to argue in favor of using Bayes factors, but I do want to point out that some scientific hypotheses are possibly exactly true. The rest mass of an electron could exactly equal the rest mass of a positron. The speed of light in a vacuum could be perfectly independent of the direction of motion relative to the source. Energy could be perfectly conserved. I realize these are not the kinds of hypotheses that are usually discussed on this social-science-oriented blog, and I agree that any model of human behavior is only going to be approximate.

By: Anoneuoid

Anoneuoid — Wed, 11 Sep 2019 04:55:09 +0000

The opposite of NHST is not Bayesian stats. The opposite of NHST is testing your hypothesis rather than a default strawman null hypothesis.

By: Andrew

Andrew — Wed, 11 Sep 2019 03:10:14 +0000

In reply to Andrew. P.S. I feel passionate about statistical methods because ultimately I care about the applications (as in our discussions of the way that classical statistical methods can lead to drastic overestimates of effect sizes in policy analysis, as discussed in section 2.1 of this article, or because I hate to see scientists waste their efforts and I'd like them to be able to do better (hence my annoyance at dead on arrival studies), or because I'm bothered by logical/mathematical/scientific errors (as with the hot hand fallacy fallacy). What I "hate" is not Bayes factors or p-values or whatever; it's the way that these methods can lead us astray.

By: Andrew

Andrew — Wed, 11 Sep 2019 02:43:35 +0000

In reply to Jeff. Jeff: There's nothing insulting here. The problem is that Bayes factors for these null hypothesis tests can easily give really bad answers. See chapter 7 of BDA3 or the above links for discussions and details. More generally (going beyond Bayes factors to my problem with null hypothesis significance testing in general), I think the null hypothesis of zero effect and zero systematic error is very rarely interesting. I'm not interested in rejecting it, given that I know ahead of time I could reject it by gathering enough data. See also here too.

By: Jeff

Jeff — Wed, 11 Sep 2019 02:26:35 +0000

Oh for God sakes. No more, please. It’s just a model. No models are true; none are not even false. The point null is a super important platonic abstract. It says, “there is a conservation or an invariance here.” What could be so insulting about that?

By: Marius

Marius — Wed, 11 Sep 2019 01:11:34 +0000

I’m not necessarily a fan of Bayes factors, but even as a non-expert I feel like I can spot some issues in the discussion of them in that post. The way Bayes factors are used in the post has a lot of the same issues as NHST, where instead of proposing a specific alternative hypothesis about the effect size, you basically just have a hypothesis “X makes Y increase”, with no real specificity about how big that increase is. If Milton’s theory is so vague that the best you can do is use a uniform prior that gives the same weight to modest increases of 1% and absolutely huge increases of 10% in unemployment, maybe the problem is with the theory and not the analysis method.