Jayakrishna Ambati writes:

I am a retina specialist and vision scientist at the University of Virginia. I am writing to you with a question on Bayesian statistics.

I am performing a meta analysis of 5 clinical studies. In addition to a random effects meta analysis model, I am running Bayesian meta analysis models using half normal priors. I’ve seen scales of 0.5 or 1.0 being used. What determines this choice? Why can’t it be 0.1 or 0.2, for example? Can I use the value of the heterogeneity tau (obtained from the random effect meta model) to calculate sigma and make that or a multiple of it to be the value of the scale?

My reply:

With only 5 groups, it can help to use an informative prior on the group-level variance. What’s a good prior to use? It depends on your prior information! How large are the effects that you might see? You can play it safe and use a weak prior, even a uniform prior on the group-level scale parameter: this will, on average, lead to an overestimate of the group-level scale, which in turn will yield to an overstatement of uncertainty.

Regarding the specific question of why you’ll see normal+(0,1) or normal+(0,0.5): This depends on the problem under study, but we can get some insight by thinking about scaling.

Consider two important special cases:

1. Continuous outcome, multilevel linear regression with predictors and outcomes scaled to have sd’s equal to 0.5 (this is our default choice because a binary variable coded to 0 and 1 will have sd of approx 0.5): we’d expect coefficients to be less than 1 in absolute value, hence a normal+(0,1) prior on the sd of a set of coefs should be weakly informative.

2. Binary outcome, multilevel logistic regression, again scaling predictors to have sd’s equal to 0.5: again, we’d expect coefs to be less than 2 in absolute value (a shift of 2 on the logit scale is pretty big), hence a normal+(0,0.5) prior on the sd of a set of coefs should be weakly informative.

In many cases, normal+(0.0.2) or normal+(0,0.1) will be fine too, in examples such as policy analysis and some areas of biomedicine where we would not expect huge effects.

A related question came up last month regarding priors for non-hierarchical regression coefficients.

A common mistake in meta-analysis is to presume (or without adequate assessment) that all the studies _should be_ combined.

Some excerpts from our unpublished philosophy of aggregation paper http://www.stat.columbia.edu/~gelman/research/unpublished/Amalgamating6.pdf

“Commonness refers to studies aiming at the same target (aspect of reality) as well as being qualitatively similar evidence of that target, hopefully varying only in precision which can be readily assessed. On the other hand, qualitatively different data sources can vary in their bias, which may be very difficult to assess and properly correct for so that something is actually common.”

“Awareness of commonness can lead to an increase in evidence regarding the target; disregarding commonness wastes evidence; and mistaken acceptance of commonness destroys otherwise available evidence. It is the tension between these last two processes that drives many of the theoretical and practical controversies within statistics.”

Variation in the quality of the studies may suggest normal+(0,1) + k, where k would mitigate the biases. Exactly what value of k should be is a very tricky thing to puzzle out. For a more thorough discussion, see https://academic.oup.com/ije/article/43/6/1969/705764

Unless I’m mistaken (I’m not an expert), the correct expression is weatherman, not retina specialist.