Erik van Zwet writes:

The post (“The Shrinkage Trilogy: How to be Bayesian when analyzing simple experiments”) didn’t get as many comments as I’d hoped, so I wrote an short explainer and a reading guide to help people understand what we’re up to.

All three papers have the same very simple model. We abstract a study as a triple (beta,b,s), where beta is the true effect and b is an unbiased, normally distributed estimator with standard error s. We also define the z-value b/s and the signal-to-noise ratio (SNR) beta/s. The SNR is really important because it directly determines the (achieved) power, the type M error (exaggeration ratio), the type S error and more.

The z-value is the sum of the SNR and standard normal noise. So the distribution of the z-value is the convolution of the distribution of the SNR and N(0,1). From the distribution of the z-value we can recover the distribution of the SNR by deconvolution.

Now suppose we have a particular study with an estimate b and standard error s. We propose to embed this study in a large collection (corpus) of similar studies with estimates b_j and standard errors s_j. From the pairs (b_j,s_j) we can estimate the distribution of the z-value and then (by deconvolution) the distribution of the SNR. If we scale the distribution of the SNR by the standard error of the study of interest, we get a prior for the true effect (beta) of that study.

That’s the whole idea, but there are some interesting things along the way:

paper #1

Theorem 1 proves a claim of Andrew that the type M error is large when the SNR – or, equivalently, the power – is low. See

andMore on my paper with John Carlin on Type M and Type S errors

Proposition 2 formalizes another post of Andrew:

Bayesian inference completely solves the multiple comparisons problem

paper #2

On p. 4 we propose a new point of view on what it means for one prior to be more informative than another.Theorem 2 says that scaling the distribution of the SNR to get a prior for beta (as we’re proposing) is the only way to ensure that the posterior inference is unaffected by changes of measurement unit.

On p 6 we discuss the anthropic principle in the context of our method, see also https://statmodeling.stat.columbia.edu/2018/05/23/anthropic-principle-statistics/

Theorem 3. We’re proposing to scale the distribution of the SNR by the standard error. But then the prior for beta becomes dependent on the sample size. That is un-Bayesian and does not necessarily yield a consistent estimate. Theorem 3 says that depending on the shape of the prior, consistent estimation is still possible.

Section 5.3 offers a proposal for the “Edlin factor”, see https://statmodeling.stat.columbia.edu/2014/02/24/edlins-rule-routinely-scaling-published-estimates/

paper #3

Figure 1 and Table 1. We estimate the distribution of the z-value and (by deconvolution) the distribution of the SNR from 20,000 pairs (b_j,s_j) from RCTs in the Cochrane database.Figure 2. We can transform the distribution of the SNR into the distribution of the (achieved) power. We find that the (achieved) power is typically quite low, see also https://statmodeling.stat.columbia.edu/2017/12/04/80-power-lie/

Figure 4. From the distribution of the SNR, we can also derive the distribution of the type M error. We show the conditional distribution of the exaggeration |b|/|beta| in the left panel of Figure 4. In the right panel, we show how our method fixes that.

I replied that, realistically, it’s tough to get comments sometimes, as there are hard problems to think about.

Erik responded:

I guess you’re right about these things being difficult. But if anyone should understand, it’s your readers. You’ve been telling them about the trouble with noisy estimates for years now! That’s why I tried to link up those papers to some of your earlier posts. Maybe that will help to provide the context.

Good point. The audience of this blog should indeed be receptive to these ideas.

Crickets…

Erik,

After looking through your excellent paper with Simon Schwab & Stephen Senn (paper #3), I was left with a big question — Why didn’t you estimate the exaggeration ratio of meta-analysis results themselves? In other words, why not meta-fy your analysis, and estimate the exaggeration ratio for the summary effect estimates?

Your team put a lot of work into estimating exaggeration ratios for the population of individual studies, with results that are fascinating and deserve a lot of attention.

However, most clinical researchers would have no understanding of the statistics in your work. For them, your article could simply be viewed as another item in a long list of evidence that supports the already common caution, “don’t rely much on results from a single study.”

On the other hand, a meta-analytic effect estimate with p less than 0.05 is seen as the highest evidence in support of a therapy, and is often make-or-break for recommendations in clinical practice guidelines and government approvals.

If there are high exaggeration ratios for meta-analyses with p less than 0.05, that could be a groundbreaking finding for evidence-based medicine. Many people have the mindset, “You can’t rely on individual studies, so trust the meta-analysis.” I think they would be shocked to learn that efficacy is commonly overestimated in statistically significant meta-analyses, even without considering biases from file drawer effects, researcher degrees of freedom in endpoint measures, and so on.

If exaggeration ratios are usually small for meta-analyses with p less than 0.05, then this would be hugely reassuring for medicine as practiced today. So, regardless of whether a careful assessment shows high or low exaggeration ratios for meta-analytic results, it seems like the assessment would be worthwhile.

I looked through your paper a few weeks ago, and comparing across some other research on meta-analyses, it seemed like exaggeration ratios would often be more than 1.5 for meta-analytic findings in many medical fields. Largely, this seems to be because the published meta-analyses in those fields often include only 4 or 5 individual studies, all of moderate or small size. However, it’s also very possible that I misunderstood and misapplied your methods. Maybe I’m even totally off base in trying to apply your methods to summary effect estimates, I’m not certain.

By the way, I’m also surprised to not see more comments on your work. But hey, my comment is so long it must count as 3 or 4…

Charles, thanks for commenting! We find that the signal-to-noise ratio SNR in single studies is often low, and this means that effects tend be overestimated. I agree that this seems to be just one more warning to not rely on single studies. But by quantifying the overestimation and proposing a fix (shrinkage), I like to think we have more to offer.

Your suggestion to look at meta-analyses is a very good one, especially since the studies in the Cochrane database are grouped by meta-analysis. We’ll probably have to do something about between-study heterogeneity, but maybe we can use the results of Turner et al. I’ll certainly give it a try!

Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. Int J Epidemiol 2012; 41: 818-27.

Erik,

Thanks for your response & I’m glad to hear my comment was useful.

Yes, agreed. Your article has much more to offer, especially since you both propose a way to quantify and follow through with actual results for Cochrane-included studies.

However, I hope the statistical innovativeness of your contributions doesn’t lead your team to lose track of the clinical and regulatory audience. They won’t notice statistical innovation and don’t place much faith in single studies, but many would be surprised to learn about this kind of exaggeration in meta-analytic findings.

Imagine USPSTF or NIHR learning that their careful, thorough meta-analyses were typically overestimating benefits to patients by, say, 1.7 fold. If I were in their shoes, I would be shocked.

Regarding your proposed fix: Given the effort and expense that PIs and sponsors invest in trials, they may not want to shrink their efficacy estimates substantially toward the null, especially when their competitors are not doing so. However, groups performing meta-analyses ideally don’t have COIs or personal involvement and can apply the fix equally across competing researchers and companies. So a fix that is targeted at the meta-analysis level may also be more tractable than a fix targeted at the level of single studies.

Thanks! Hm. Since you mention issues posed by between-study heterogeneity, I’m guessing that you are thinking about quantifying the overestimation in meta-analyses as follows: Repeat the methods that you already use, with the exception that you include the b and se values from the meta-analyses, instead of the b and se values from the individual studies.

However, what is the obstacle to this alternative: (1) apply your proposed priors to the individual studies, (2) meta-analyze, and (3) contrast results from these adjusted meta-analyses with results from the original meta-analyses?

Anyhow, best of luck.

“Given the effort and expense that PIs and sponsors invest in trials, they may not want to shrink their efficacy estimates substantially toward the null, especially when their competitors are not doing so.”

That’s true, but they don’t have to. Based on the reported effect and standard error, the reader can do their own shrinkage!

I can think of various ways to apply our methods to meta-analyses and I certainly want to give it a try. If you send me an email (I’m easy to find at the LUMC), I can let you know if I make any progress. Perhaps you’ll be able to give me some more feedback.

For what it’s worth, the last post/papers have been one of the perpetually open tabs on my computer since it was posted. This post is giving me a little more of a push (and I’m posting this comment to push me more) though so we’ll see if I get to them soon and I will post any comments/questions I have then. Thanks for your work on these, Erik.

Michael, it’s worth a lot to me that we’re on your to-do list! Please comment or email me if you’d like to discuss something.

Just a spontaneous thought: a shiny demo could be good to get this sort of point across to readers? Where they can play with parameters abd see the results.

Not just. Just a thought.

I made a shiny demo a while ago, but it still needs some work. You can have a look at vanzwet.shinyapps.io/shiny/. Let me know what you think!

The S-type error has applicability in verbal representation (generalization) of findings. This is a fundamental issue that deals with how statistical analysis is communicated

https://link.springer.com/epdf/10.1007/s11192-021-03914-1?sharing_token=mMuqWexCtM22kWb_cmt1Wve4RwlQNchNByi7wbcMAY5jCNz70gaNM4YkspozCdZFaXaTaF37JLE-8xNO0FdBUZWEiRV_9i5NSBXAp7PoLoygXzJCux_3zkFOq8YD8AT8l0b9MJrLa8P2JUo8DdZIGSztgvaqnslxV4j5m48GJNo%3D

Thanks for the link! I’ll read the paper, and get in touch with you.