This one has come up before but it’s worth a reminder.
Stephen Senn is a thoughtful statistician and I generally agree with his advice but I think he was kinda wrong on this one. Wrong in an interesting way.
Senn’s article is from 2002 and it is called “Power is indeed irrelevant in interpreting completed studies.” His point is that you perform power analysis (that is, figuring out the probability that a study will attain statistical significance at some predetermined level, conditional on some assumptions about effect sizes and measurement error) in order to design a study. Once the data have been collected, so the reasoning goes, the power calculation is irrelevant.
The definition of a medical statistician is one who will not accept that Columbus discovered America because he said he was looking for India in the trial plan. Columbus made an error in his power calculation—he relied on an estimate of the size of the Earth that was too small, but he made one none the less, and it turned out to have very fruitful consequences.
The Columbus example works for Senn because, although Columbus’s theoretical foundation was wrong, he really did find a route to America. Low-power statistical studies are different, because there, “statistically significant” and thus publishable results can be spurious. And, the lower the power, the more likely the estimate is vastly inflated and possibly in the wrong direction.
In a low-power study, the seeming “win” of statistical significance can actually be a trap. Economists speak of a “winner’s curse” in which the highest bidder in an auction will, on average, be overpaying. Research studies—even randomized experiments—suffer from a similar winner’s curse, that by focusing on comparisons that are statistically significant, we (the scholarly community as well as individual researchers) get a systematically biased and over-optimistic picture of the world.
John Carlin and I argue in our recent paper that post-data design analysis (I prefer not to speak of power analysis because power is all about declarations of statistical significance, and I’m more interested in type M and type S errors) can be a key step in avoiding this winner’s curse.
In short: design analysis can help inform a statistical data summary.
From a frequentist perspective, design analysis—before or after data are collected—give you a distribution or set of distributions on which to condition, to evaluate frequency properties of statistical procedures. (In frequentist analysis you work with distributions conditional on the true parameter value theta, and the key step of design analysis is to posit some reasonable range of possibilities for theta.)
From a Bayesian perspective, design analysis works because it’s a way to include prior information and indeed can be thought of as an approximation to a fully Bayesian analysis.
In any case, the hypothesized parameter values in the design analysis should come using external subject-matter knowledge, not from the noisy point estimate from the data at hand. For example, if you run a little experiment and you get an estimate of 42 with a standard error of 20, you should probably not base your design analysis on an assumed underlying effect size of 42 or even 22. So, sure, if that’s what you were going to do, I agree with Senn that this is a bad idea.
Anyway, I’m not trying to criticize Senn for writing a paper in 2002 that does not anticipate the argument from our paper published twelve years later. I just think it’s interesting that a recommendation which in its context made complete sense—on both theoretical and practical grounds—now needs updating given our new understanding of how to interpret published results.