All I have falsified is the irrelevant hypothesis that anchoring for rivers is like anchoring for doors.

All I have falsified is a hypothesis I considered only so that I could run the Bayes factor, one I was never

interestedin.

This critique is basically, “I used a tool to do a stupid thing, therefore the tool is stupid.” There’s not much to say about this sort of argument beyond pointing out its basic structure.

In the “Thing 2” section we have this line:

If Milton had said “the effect is anywhere between -50% and +50%”, a prediction that is *never* false, the Bayes factor would *always* deem it false, because observed values would always be, on average, unlikely, under ‘the alternative.’

This is clearly wrong, and it shows me that Simonsohn hasn’t carried out the math of Bayes factors (or at least, hasn’t carried it out correctly) and thus doesn’t understand them. Earlier in the post he posits a 1% observed effect; if that effect had been measured very precisely then it would fall well outside of the neighbourhood in which the prior predictive mass under the no-effect hypothesis is found, and then the alternative hypothesis of “some effect between -50% and +50%” would indeed be favoured.

How can we be having a discussion of Bayes factors — or any discussion about statistics — in which we’re talking about an observed effect and not giving consideration to the uncertainty with which the effect is measured? We wouldn’t be talking about p-values in this example without at least mentioning the standard error…

]]>Sure, the difference is that those are things predicted by theory rather than the opposite of what is predicted by theory.

]]>1. The issue of the point null hypothesis and its plausibility is really orthogonal to the Bayes factor, as Andrew suggests. The Bayes factor can be used to test *any* two models, as long as they make predictions. If you prefer a normal prior with variance epsilon instead of the point, nothing stops you from using that instead.

2. For discrete parameter spaces, the update from prior to posterior distribution *is* a Bayes factor. Bayes factors are part of Bayes rule; this is why Jack Good termed them “Bayes” factors. See https://www.bayesianspectacles.org/bayes-factors-for-those-who-hate-bayes-factors/

3. To me, in my line of work, Bayes factors seem to address the question that researchers care about: “Is there some signal in this noise, or am I just reading tea leaves?” Harold Jeffreys claimed that (from memory): “variation should be considered random until evidence to the contrary is presented”. I think this is a nice statistical interpretation of what it means to be skeptical, and organized skepticism is an important part of science.

4. A recent paper outlining the philosophy of Jeffreys and its practical implementation is: Ly, A., Stefan, A., van Doorn, J., Dablander, F., van den Bergh, D., Sarafoglou, A., Kucharsky, S., Derks, K., Gronau, Q. F., Raj, A., Boehm, U., van Kesteren, E.-J., Hinne, M., Matzke, D., Marsman, M., & Wagenmakers, E.-J. (in press). The Bayesian methodology of Sir Harold Jeffreys as a practical alternative to the p-value hypothesis test. Computational Brain & Behavior. Preprint: https://psyarxiv.com/dhb7x

Cheers,

E.J.

https://www.ncbi.nlm.nih.gov/pubmed/31094544

Preprint, appendices, and R code freely available at https://osf.io/jmwk6/

]]>P.S. I feel passionate about statistical methods because ultimately I care about the applications (as in our discussions of the way that classical statistical methods can lead to drastic overestimates of effect sizes in policy analysis, as discussed in section 2.1 of this article, or because I hate to see scientists waste their efforts and I’d like them to be able to do better (hence my annoyance at dead on arrival studies), or because I’m bothered by logical/mathematical/scientific errors (as with the hot hand fallacy fallacy). What I “hate” is not Bayes factors or p-values or whatever; it’s the way that these methods can lead us astray.

]]>Jeff:

There’s nothing insulting here. The problem is that Bayes factors for these null hypothesis tests can easily give really bad answers. See chapter 7 of BDA3 or the above links for discussions and details.

More generally (going beyond Bayes factors to my problem with null hypothesis significance testing in general), I think the null hypothesis of zero effect and zero systematic error is very rarely interesting. I’m not interested in rejecting it, given that I know ahead of time I could reject it by gathering enough data. See also here too.

]]>