Erik van Zwet explains the Shrinkage Trilogy

The Shrinkage Trilogy is a set of three articles written by Zwet et al.:

1. The Significance Filter, the Winner’s Curse and the Need to Shrink at http://arxiv.org/abs/2009.09440 (Erik van Zwet and Eric Cator)

2. A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates at http://arxiv.org/abs/2011.15037 (Erik van Zwet and Andrew Gelman)

3. The Statistical Properties of RCTs and a Proposal for Shrinkage at http://arxiv.org/abs/2011.15004 (Erik van Zwet, Simon Schwab and Stephen Senn)

To help out, Zwet also prepared this markdown file explaining the details. Enjoy.

14 thoughts on “Erik van Zwet explains the Shrinkage Trilogy

  1. The idea from the writeup that 80% power is not achieved because med research is hard is I think a bit naive.

    The way power calcs are done is someone tries to figure out how much money they are likely to get approval for, and then given that the power has to be 80% by policy, backs out the effect size they need to claim is realistic so as to justify the money they are asking for.

    It’s like Lenin said…. I am the walrus.

    • In my experience, this is not the way it usually goes. That might happen for the early studies in an area. But it doesn’t take long for some effect size to become “the standard” in a given area. That “standard effect size” may well have been established in the (bogus) way that Daniel Lakeland says. But once it’s established, everybody else has to use it and that parameter of the sample size calculation is now fixed. Also fixed, in most funding mechanisms, is the amount of money you can get. (And in some situations the sample size itself is capped by logistical limitations.)

      So the manipulable parameters are cost per participant to execute the design, and the outcome variance. Outcome variance can be manipulated by changing inclusion/exclusion criteria (which, of course, comes at the expense of generalizability, and also the likelihood that the true effect size will be different from what you expect). Cost per participant can be manipulated by changing the duration or intensity of follow-up, intensity of interventions, choice of outcome measure, etc. These manipulations may also have effects on outcome variance and may also alter the true effect size. But it is these parameters, not the targeted effect size, that are typically manipulated.

      Also, the power that review panels look for is in flux these days, and while 80% might still be acceptable, claiming 90% power is better grantsmanship.

      So while I agree that power/sample size calculations in medical research are contaminated by non-scientific considerations in order to meet budgetary constraints, the ways in which it is done are more diverse than he sets out.

      • Clyde is right, I was thinking mainly of early stages in a particular field. Once you’ve got a couple medicines or treatments, you have to match those effect sizes. One thing we both agree on is that the grant size is pretty much fixed, so if you want to hit 80-90% power you’re forced to basically do a shoddier job on more people (ie. measure things fewer times, shorter followup times, or change outcome measures to be things that occur more often and hence you can have higher power to detect. Maybe for example changes in cholesterol levels and in side effect frequency rather than say heart attack frequency and all cause mortality which is what people actually care about)

        Instead of thinking of power analysis (ie. the probability to declare statistical significance if there is a real effect of a certain size) the world would be much better off if we looked at Bayesian information-added… like how concentrated will the posterior be. It’s useful to say “regardless of what the effect size is, provided it’s in the high probability region of the prior, after doing this study we will have a posterior whose width is less than W”

    • Sure. Time, money and the availabilty of subjects are very important considerations. But it is recommended (*) that clinical studies are powered for the minimal clinically relevant effect or “the effect you wouldn’t want to miss”. It’s then to be expected that many (most even) experimental treatments do not have such effects.

      (*) The EMA’s ICH E9 guideline states: “The treatment difference to be detected may be based on a judgement concerning the minimal effect which has clinical relevance in the management of patients or on a judgement concerning the anticipated effect of the new treatment, where this is larger.”

    • Well they are “plausible” in the sense that they wouldn’t be laughed off the grant evaluation committee… The way it works imho is after computing the required effect size for 80% power at the given fixed budget, if it’s laughably large you don’t submit the grant. If its nudge nudge wink wink plausible then you go ahead and submit the grant. But if you eliminated all the rigamarole and just surveyed experts in the field “what’s a reasonable effect size you’d expect for a treatment” then they’d likely give estimates more in line with what’s actually found.

  2. Thanks for re-posting, Andrew! These papers have now been published (all Open Access thanks to the deep pockets of the Dutch government). I hope I got the html tags right:

    The significance filter, the winner’s curse and the need to shrink

    A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates

    The statistical properties of RCTs and a proposal for shrinkage

    And there are even 2 more:

    Addressing exaggeration of effects from single RCTs with Simon Schwab and Sander Greenland

    How large should the next study be? Predictive power and sample size requirements for replication studies with Steve Goodman

  3. Neat stuff! I hope we can start to trickle out awareness of these issues to more and more fields. In my experience, it is still quite uncommon.
    Quick Q: why are we fitting a 3-component normal mixture to the Cochrane database in the linked Rmd file? The RHS of figure in section 2 looks quite unimodal and normal to me – I’m not sure how those components are even identifiable. On the other hand, I’m probably missing something obvious!

    -Chris

Leave a Reply

Your email address will not be published. Required fields are marked *