Skip to content

On this 4th of July, let’s declare independence from “95%”

Plan your experiment, gather your data, do your inference for all effects and interactions of interest. When all is said and done, accept some level of uncertainty in your conclusions: you might not be 97.5% sure that the treatment effect is positive, but that’s fine. For one thing, decisions need to be made. You were already going to make some decision with much less information—that is, with much more uncertainty. Now that you have more information, you can make a more informed decision. The other thing is, even if you did have a super-clean experiment with excellent measurements and a large and stable effect, so that you had that 95% interval excluding zero for your quantity of interest . . . so what? Whatever you care about is in the future, so even if your treatment was so great compared to the alternative in your sample being studied, there’s no saying what it will be in future populations under different conditions.

That’s not to say that you can’t learn from data; I’m not saying that at all. You can learn a lot from data. But forget about 95%. Just do your best, live your life, and be open about your uncertainties. You might get run over by a bus tomorrow anyway.


  1. Rafael Garcia says:

    Love this.

  2. Zad Chow says:

    “You might get run over by a bus tomorrow anyway.“

    Can anyone confirm whether this statement is statistically significant?

  3. Shravan says:

    The problem is that this strategy, written down honestly in the paper itself, will lead to a desk reject.

    No paper, no job, no funding, no nothing. One would have to go to a data science company and make stuff up there instead of in academia.

    We have one paper from my lab in 2015 where we were honest about our conclusion that we have no conclusion. One reviewer recommended rejecting the paper because it didn’t advance knowledge. We barely scraped through to acceptance.

    Andrew, our paper, which will appear in the Journal of Memory and Language, is the first time in my life that my lab succeeded in getting a paper accepted in a top-ranking journal despite writing that seven experiments lead us to no clear conclusion. I doubt that I can replicate this in the future.

    Sadly nobody will give us an award for finding out that we didn’t find out much from a bunch of studies. But we (as in people in general) want that award.

    • Shravan says:

      In other words, dishonesty, to varying degrees (ranging from white lies to artful decoration of the truth), is a minimum prerequisite for success as typically measured in empirical science.

    • Guive says:

      I think the idea is to convince enough people of this that the need to exaggerate certainty goes away.

    • Worked at a Data Science Company says:

      Honestly, getting buy-in on null-ish results can be pretty hard in the private sector too. The decisionmakers who were consuming analysis I worked on focused heavily on significance testing. Even technically literate ones with background in quantitative fields expected some sort of significant finding out of every measurement exercise, even in small samples.

      Given an overall null result and a significant result in a subgroup, it was pretty hard to convince them to think about things as “product / software feature / advertisement X isn’t much better than the status quo”. They would instead zero in on the conclusion that “product / software feature / advertisement X is a highly significant improvement in subgroup Y”, even when subgroup sizes were small and effect sizes were too large to be believable.

    • Dzhaughn says:

      Think about why you value that award.

      • So, he says “people in general”

        Why should people in general value publication and payment for an honest attempt to determine something from a set of careful experiments about a certain topic?

        The mere fact that those experiments, carefully designed, were uninformative about the topic is itself informative about what the mechanisms could be. Any mechanism which predicts that those experiments should be informative can be ruled out.

        now if something is uninformative because you did a bad job on the experiments, had a lot of measurement noise, and didn’t think about mechanisms before you did the experiment… no cookie for you. But when a scientist says “I’m going to test how much the earth’s motion through the ether affects the speed of light” and the result is “gee we couldn’t detect any difference” *that itself* leads to abandoning theories involving the ether… and moves the science forward towards the speed of light as an invariant…

        Imagine if no-one let michelson and morley publish because they didn’t find a “discovery” :-\

        To me the following things should qualify your paper for publication automatically (sufficient, but not necessary)

        1) That a plausible theory or set of theories has been proposed
        2) That the experiment is constructed to address that theory
        3) That the data as collected follows the experimental design sufficiently to address the theory

        Notice how there’s no 4) that the result is statistically significant / that the conclusion is unambiguous / that the data was sufficient to eliminate all but one theory / etc

        • Shravan says:

          Yes, this is what I am trying to do right now. However, I may have to give up on publishing actual research in “prestige” journals. Those are reserved for “big news” stories that are probably fabricated to some degree or another. Fabrication implies fraud but there are many ways to be completely honest and still get away with fabrication.

          A common trick in psycholinguistics is to fit repeated measures ANOVAs when a linear mixed model gives you an absolute t value lower than 2 (a no-cookie scenario). Now, what you do it print out so-called by-subjects and by-items ANOVAs, and if the former is significant, you are in a cookie-scenario. You just hope the reviewer sees the magic p less than 0.05 in one case but ignores the other. This trick usually works. Is it fraud? Of course not; the authors were totally honest in the paper. The conclusion, that the overall significance test passed the threshold, is wrong but heck, if the reviewers OK’d it, who can complain? I know that people in psycholinguistics even teach their grad students that as long as by-subjects ANOVAs come out sig you are golden.

          Now, since I always work with graduate students or postdocs, their publications will not look as good as others’ and they will therefore not be selected for jobs when there is a competition with people publishing in these top journals. Because selection committees and funding agencies look at *where* a paper appeared, not what was in it.

          The only way I see to publish in top journals while staying honest is to publish methodology papers. “This is how we are doing things wrong” or “if you do this you will make the following mistake”. So that’s what I am doing; the actual scientific enterprise has to take a back seat for now. Everyone gets excited if one makes a methodology point; if you take the *exact same data* and make a scientific point instead, reject. This has been my experience.

          Everyone loves reading about how to do things right, but nobody really wants to do it, and for good reason. The only people who can do it are people who are so famous that it literally doesn’t matter what they write, they will get accepted. I am unfortunately not one of these people.

          My point is: Andrew’s comments, while correct, are not actionable. That’s why nothing much is changing. I know lots of people in psycholinguistics who are working honestly to do things right, but then their publication counts are not good enough to stay in academia, and they move to industry (not that the pressures there are much different), or just do something else with their lives.

          • Martha (Smith) says:

            Thanks for this comment. We need to put our heads, etc. together to try to figure out how to actually promote positive change. Maybe Andrew can devote a post (or several) to this?

  4. Bill Jefferys says:

    The most important word in Andrew’s first paragraph is one that he used several times: “decision”.

    The ultimate aim of much of the statistical analysis is just as he says, that ultimately a decision has to be made to take some action or some other action.

    This means that it is not only our assessment of the probabilities of the various states of nature that is important (however we assess them), but also our evaluation of the consequences of taking the available courses of action under the different states of nature.

    It is the combination of these that tells us what the best decision is, not only our assessment of the probabilities of the various states of nature.

    And under some circumstances, even a modest probability of a particular state of nature could well dictate a particular action, given a loss or utility function that favors that particular action sufficiently if that state of nature turned out to be true; and the converse can also be the case under other circumstances.

    All of which is a way of saying that in my opinion, that decision theory is completely ignored in many areas of research where it absolutely should not be ignored is a serious problem.

Leave a Reply