Why I don’t (usually) care about multiple comparisons

Posted on February 7, 2007 2:53 AM by Andrew

Statisticians often get worried about multiple comparisons and have various procedures for adjusting p-values. It’s only very rarely come up in my own research, though. Here’s a presentation explaining why, from a workshop of educational researchers to which I was invited by Rob Hollister, an economist and policy analyst at Swarthmore College.

The punch line is that multilevel models do well by doing partial pooling on the point estimates, as compared to classical multiple comparisons methods which leave the estimates unchanged and just make the confidence intervals wider.

P.S. In answer to Vasishth’s comment: we did some computations and simulations to illustrate this point in this article (with Francis Tuerlicnkx) that appeared in Computational Statistics in 2000. See also this discussion with Nick Longford.

4 thoughts on “Why I don’t (usually) care about multiple comparisons”

vasishth on February 7, 2007 2:13 AM at 2:13 am said:

This is very interesting, and conceptually it makes sense. However, I think it would really help to clarify the issue with some simulations. The basic idea would be to set up fake population(s) where the truth is known, and then to repeatedly sample from them to see if one does get significant differences in multiple comparisons more often than 5% of the time.

Depending on which book one reads, one usually gets either a conceptual argument such as the one you gave, or (more often) just a "do this, don't do that" type of cook-book instruction. It would be cool to see the above argument discussed with simulations. Maybe I'm the only one out there, but simulations have a convincing power that goes beyond logic and reasoning for me.
Kaiser on February 7, 2007 5:57 AM at 5:57 am said:

I particularly like your discussion of Type S vs Type M error. It's definitely a more practical view than the Type I/II framework. You're fortunate that most of the effects you are looking for are large sign differences. And I feel for those who are looking for small positive differences, which is often the case when for example one is testing a direct mail piece with or without a photo. And if the test is made sufficiently complex, one is truly staring at a large 2D matrix of pairwise differences and fishing out significant ones!
Aleks on February 7, 2007 1:20 PM at 1:20 pm said:

My take on the topic is the same as the one described by Andrew and Francis Tuerlicnkx. When there are several hypotheses to test, such as θ_1 > θ_2 & θ_1 > θ_3, one can simply explore the probability that this is the case in the posterior (say by checking if the hypothesis is true in the as to obtain a quantity very much like the p-value. Depending on the independence between the hypotheses, the probability of a conjunction being true is going to be lower than the probability that either of them is going to be true. The real question is whether we care about such conjunctions. I've explored this in Information-Theoretic Exploration and Evaluation of Models.

Hypotheses like θ_1 = θ_2 strike me as infinitely unlikely, but still, one can decide to assign a non-zero prior probability to this specific event.
vasishth on February 8, 2007 10:24 AM at 10:24 am said:

The article: absolutely amazing. But your point is only valid for situations where tau/sigma is near zero. When it's 2, say, it doesn't really matter whether we use the conventional methods or the Bayesian approach (although I cannot imagine a situation where tau/sigma=2, i.e., the sigma is the "true" value of the sample standard deviation from the sample, and tau is true value of the standard deviation of the distribution of means–how could the sample sd be half that of the population? Perhaps I misundestood what sigma and tau are).

The practical point I take away from this is that this paper is a good reason why we should report highest posterior density intervals rather than, say, p values (which is proving hard to do anyway for lmer, given the difficulty in determining the dfs).

Incidentally, regarding an earlier comment you made, about your not trusting the HPDintervals function in R because you do not know exactly what it does, I have no reason to mistrust HPDintervals, just as I have no reason to mistust your version :).

Comments are closed.