Skip to content

Three-warned is three-armed

Simon Gates writes:

Here is a paper just published in JAMA, on correction for multiple testing, and the clinical trial it refers to (also, I’ve just noticed, relevant to yesterday’s post [this one, I think. — AG]). This sort of sequential testing (and non-testing) is quite common, for example in three-armed trials (not saying I agree with it!), but this is the most extreme example I have seen. The starting point seems to be that the only important thing that comes out of the trial is whether or not you can call it “significant.”

There are some handy features here, like the ability to get different conclusions from the same data, just by listing the outcomes in a different order, and the way that you could end up spending loads of time and effort collecting outcome data, that you don’t then analyse.

I’d like to have seen JAMA be a bit more critical about this – actually I’m surprised they aren’t, as one of the authors has strong Bayesian credentials, but maybe he’d say he’s just describing the method, not advocating it. But I can see people reading this and thinking it’s a great idea to apply to their next trial.

I agree with Gates that this focus on statistical significance feels like a mistake. The goal should be to learn and to be able to make accurate predictions and effective decisions about treatments, not to extract statements that give the air of certainty.

P.S. The title of this post is a reference to an old math story that perhaps some readers will recall.


  1. Ryan King says:

    If they are exchangeable under the null, you can avoid the sequence dependency by ranking the tests and using a test calibrated on the relevant order statistics to gatekeep. For a FDR calculation you have to do a jointly calibrated test like permutations. Gatekeepers are super clever when the subsequent tests are impossible or irrelevant with the null of an earlier test.

  2. Fernando says:

    It seems to me the ordering in the example above is rather arbitrary. An alternative is sequentially partitioned hypotheses, which relies on a natural nesting of hypotheses. See Chp 19 in Design of Observational Studies By Paul R. Rosenbaum

Leave a Reply