Duality between multilevel models and multiple comparisons adjustments, and the relevance of this to some discussions of replication failures

Pascal Jordan writes:

I stumbled upon a paper which I think you might find interesting to read or discuss: “Ignored evident multiplicity harms replicability—adjusting for it offers a remedy” from Yoav Zeevi, Sofi Astashenko, Yoav Benjamini. It deals with the topic of the replication crisis. More specifically with a sort of reanalysis of the Reproducibility Project in Psychology.

There are parts which I [Jordan] think overlap with your position on forking paths: For example, the authors argue that there are numerous implicit comparisons in the original studies which are not accounted for in the reporting of the results of the studies. The paper also partly offers an explanation as to why social psychology is particularly troubled with low rates of replicability (according to the paper: the mean number of implicit comparisons is higher in social psychology than in cognitive psychology).

On the other hand, the authors adhere to the traditional hypothesis-testing paradigm (with corrections for implicit comparisons) which I know you are a critic of.

My reactions:

1. As we’ve discussed, there is a duality between multiple-comparisons corrections and hierarchical models: in both cases we are considering a distribution of possible comparisons. In practice that means we can get similar applied results using different philosophies, as long as we make use of relevant information. For example, when Zeevi et al. in the above-linked article write of “addressing multiplicity,” I would consider multilevel modeling (as here) one way of doing this.

2. I think a key cause of unreplicable results is not the number of comparisons (implicit or otherwise) but rather the size of underlying effects. Social and evolutionary psychology have been in this weird position where they design noisy studies to estimate underlying effects that are small (see here) and can be highly variable (the piranha problem). Put this together and you have a kangaroo situation. Multiple comparisons and forking paths make this all worse, as they give researchers who are clueless or unscrupulous (or both) a way of declaring statistical significance out of data that are essentially pure noise—but I think the underlying problem is that they’re using noisy experiments to study small and variable effects.

3. Another way to say it is that this is a problem of misplaced rigor. Social psychology research often uses randomized experiments and significance tests: two tools for rigor. But all the randomization in the world won’t get you external validity, and all the significance testing in the world won’t save you if the signal is low relative to the noise.

So, just to be clear, “adjusting for multiplicity” can be done using multilevel modeling without any reference to p-values; thus, the overall points of the above-linked paper are more general than any specific method they might propose (just as, conversely, the underlying ideas of any paper of mine on hierarchical modeling could be transmuted into a statement about multiple comparisons methods). And no method of analysis (whether it be p-values, hierarchical modeling, whatever) can get around the problem of studies that are too noisy for the effects they’re trying to estimate.

1 thought on “Duality between multilevel models and multiple comparisons adjustments, and the relevance of this to some discussions of replication failures

  1. “the underlying problem is that they’re using noisy experiments to study small and variable effects.”

    Doncha mean “hypothesized” effects? or, still more accurately, “postulated” or “potential” or “possible” or still more accurately: “things-we-could-imagine-might-happen” effects? Like conservatives are better at arm wrestling and shark attacks predict the number of potholes on my street, or a half hour video in eighth grade leads to higher lifetime earnings.

Leave a Reply

Your email address will not be published. Required fields are marked *