## Some Implications of Factoring Cost into Statistical Power Calculations

Jim Dannemiller writes:

I ran across your discussion on retrospective power and power calculations in general, and I thought that you might be interested in this manuscript [link fixed] that Ron Serlin and I are working on at present.

Their idea is to formally put costs into the model and then optimize, instead of setting Type 1 and Type 2 error rates ahead of time. I’m already on record as not being a fan of the concepts of Type 1 and Type 2 errors; that said, we do work in that framework in chapter 20 of our book, and my guess is that it does make sense to put costs into the model explicitly. So I imagine this article by Dannemiller and Serlin is a step forward.

P.S. Jim also said he liked our Teaching Statistics book!

1. Daniel Lakeland says:

This perspective arises naturally in forensic engineering investigation.

Imagine you are an installer of some product, and you are being sued for faulty installations among some class of people (say everyone in some county who used your services).

As part of this process you would like to balance the cost of investigating and collecting data to build your case, and the benefit you derive from that data. Imagine for example that we can put a unit cost C on an incorrectly installed product, and the goal is then to investigate and show that the proportion of incorrectly installed products is p rather than whatever the plaintiff claims (usually p' = 1).

There is then a natural cost associated with the process I + C p + C*1.96*s

I is the investigation cost, C is the unit cost per incorrectly installed product, and p is the proportion of incorrectly installed products. s is the standard error of p, since in general juries will give the benefit of the doubt to the plaintiff, you will wind up paying based on something like p95, the 95th percentile of your p estimate.

now I, and s naturally involve N the number of units observed. I is a linear function of N and s is proportional to 1/sqrt(n) in a simple model. The optimum N is the one that minimizes this total cost… voila.

Surprisingly, almost nobody in the construction litigation industry seems to be aware of this formulation…

2. jsalvati says:

Is this not a common approach?

I would have thought that a decision theoretic approach would be the first approach that statisticians would use in trying to find optimal sample sizes, and that applying such frameworks would be routine.

In fact, googling "optimal sample size" seems to yield quite a few articles about people doing just that.

3. Jim Dannemiller says:

I would say, no. That is basically why Ron Serlin and I wrote the paper. Instead, I was taught, as were most of my colleagues, simply to pick an arbitrary power (usually .8), an effect size, a Type I error probability, and calculate or look up the sample size given those parameters. I can't think of a single time in my professional career (25+ years) that cost or anything like an optimal sample size was ever discussed. Instead, it is typically the scenario above or previous experience with similar experiments that determine the planned sample size. There is yet another strategy of using some arbitrary but reasonable sample size, and adding subjects one at a time if the p-value is close to .05 (known as chasing p). Needless to say, something a little more rational would seem to be in order, so Ron Serlin and I set out to try to do that.

4. Keith O'Rourke says:

Jim:

My first job involved getting at the cost-effectiveness of funding clinical trials (1985) but even then it was an old topic – Apparently Yates did something very similar in an attempt to get more funding for the Rothsdam Experimental Station – and this is summarized and reviewed in Cox and Hinkely's book on Theoretical Statistics.

Also much current work in Heath Economics with Bayesian authors

But it is perhaps not on the top of most statisticians' minds – the worst example I know of was a conventional 80% power sample size calculation that lead to only a subsample of clinical trial participants being surveyed – all that was saved was postage (trial participants aparently enjoyed answering the survey) and the sample turned out to be too small AND the study can never be done again for ethical reasons.

Keith

5. Andrew Gelman says:

Keith,

Exactly. We know that we _can_ do such calculations, but we almost never too. I'm as bad an offender as anybody else. Many times I've done this standard power calculation because somebody wanted it for a grant application.

6. jsalvati says:

Jim,

That is a little bit shocking to me. I have always been on the lookout for a good solid book on decision theory as applied to experimental design because I figured it would be pretty useful to me as a process engineer (in training) because testing is often expensive and you frequently want to come up with a scheme for routine testing. I guess now I know why I never found one.

7. I'm guilty of this too. (I really like the manuscript!). The two areas where it would be very useful to me (ie my clients) are in the planning of microarray experiments and ecological field experiments. Microarrays are very expensive, and the client always wants to use the minimum they can get away with. Ecological field experiments are usually not very expensive, as they are usually very low-tech. But replication is a problem because of the spatial scale of the experiment e.g rainforest fragments at the landscape scale. The experiments usually run over several years and require a lot of manpower (student power!) to monitor. Explicit incorporation of costs could help the client to determine whether they should really run the experiment or apply for more grant money. I never got much decision theory in my statistics education. That's partly why I'm working my way slowly through Berger's classic text.