## The Pandora Principle in statistics — and its malign converse, the ostrich

The Pandora Principle is that once you’ve considered a possible interaction or bias or confounder, you can’t un-think it. The malign converse is when people realize this and then design their studies to avoid putting themselves in a position where they have to consider some potentially important factor. For example, suppose you’re considering some policy intervention that can be done in several different ways, or conducted in several different contexts. The recommended approach is, if possible, to try out different realistic versions of the treatments in various realistic scenarios; you can then estimate an average treatment effect and also do your best to estimate variation in the effect (recognizing the difficulties inherent in that famous 1/16 efficiency ratio). An alternative, which one might call the reverse-Pandora approach, is to do a large study with just a single precise version of the treatment. This can give a cleaner estimate of the effect in that particular scenario, but to extend it to the real world will require some modeling or assumption about how the effect might vary. Going full ostrich here, one could simply carry over the estimated treatment effect from the simple experiment and not consider any variation at all. The idea would be that if you’d considered two or more flavors of treatment, you’re really have to consider the possibility of variation in effect, and propagate that into your decision making. But if you only consider one possibility, you could ostrich it and keep Pandora at bay. The ostrich approach might get you a publication and even some policy inference but it’s bad science and, I think, bad policy.

That said, there’s no easy answer, as there will always be additional possible confounding factors that you will not have be able to explore. That is, among all the scary contents of Pandora’s box, one thing that flies out is another box, and really you should open that one too . . . that’s the Cantor principle, which we encounter in so many places in statistics.

tl;dr: You can’t put Pandora back in the box. But really she shouldn’t’ve been trapped in there in the first place.

1. Shravan says:

Even if one does a two-condition study, taking all plausible sources of variability into account quickly becomes a nightmare. Shall I include trial effects, and trial:condition interactions? Surely subjects must be getting fatigued over time, or learning to strategize. Should I include the amount of alcohol they drank the night before, or the amount of sleep they had? Gender? Age? I generally ignore all these variables in my experiments because it would become a mess.

2. Kaiser says:

another advantage of the ostrich approach is the built-in defense against the replication brigade. If it doesn’t replicate, it must be because the single precise treatment wasn’t replicated precisely.

In some sense, the “ostrich approach” isn’t always a terrible idea.

For example, any type of engineering begins in the lab. This is a world where things can be isolated and altered in an extremely precise, controlled manner. Then one slowly begins testing more and more variety of inputs until you think it’s ready for the real world. In fact, I think you can argue almost all science advances in this manner.

Of course, it’s really important to realize the *speed* at which your knowledge advances. In the lab, you can often collect data for relatively cheap, change some parameters, collect more data, and repeat. If you’re collecting data on live subjects, the time in between tweaking situational parameters is often measured in years (i.e. the next study) rather than in days. And if your measures in isolated parameters data are very noisy…well I don’t need to preach to the choir here.

Along similar lines Paul Rosenbaum (http://www.jstor.org/stable/2676761) makes the argument that observational studies work best when they are designed to test a well-specified theory. This is how “hard” science progresses in labs (well, that’s over-simplified, but so is everything). Then external validity is less of an issue. I wonder to what extent the same argument applies to randomized experiments.
> A laboratory experiment makes no effort to draw up a frame comprising all circumstances, locations and times when the treatment might be applied and to draw a representative sample of such circumstances. Rather, the laboratory experiment examines the theory under highly unrepresentative circumstances, namely, circumstances in which sensitive, calibrated measuring instruments are used in an environment carefully freed of forces that might intrude on the experiment, in which the treatment is delivered at doses sufficient to produce dramatic effects if the theory is correct or to produce an equally dramatic absence of effect if the theory is incorrect.

Of course, we don’t just want to test theories, we also want to predict the effects of our intervention in varying circumstances, so treatment effect heterogeneity, etc., still matters. But anyway “a reader” has a good point.