Skip to content

(Towards) a solution to a 40-year-old problem: Prior distributions for variance parameters in hierarchical models

Fully Bayesian analyses of hierarchical linear models have been considered for at least forty years. A persistent challenge has been choosing a prior distribution for the hierarchical variance parameters. Proposed models include uniform distributions (on various scales), inverse-gamma distributions, and other families. We have recently made some progress in this area (see this paper, to appear in the online journal Bayesian Analysis).


The inverse-gamma has been popular recently (partly because it was used in the examples in the manual for the Bugs software package) but it has some unattractive properties–most notably, in the limit that the hyperparameters approach zero, the posterior distribution does not approach any reasonable limit. This casts suspicion on the standard use of prior densities such as inverse-gamma(0.001,0.001).


We have a new folded-noncentral-t family of prior distributions that are conditionally conjugate for the hierarchical normal model. The trick is to use a redundant multiplicative parameterization for the hierarchical standard deviation parameter–writing it as a product of two random variables, one with normal prior distribution and the other with an square-root-inverse-gamma model. The product is a folded-noncentral-t.

A special case of this model is the half-Cauchy (that is, the positive part of a Cauchy distribution, centered at 0). We tried it out on a standard example–the 8-schools problem from Chapter 5 of Bayesian Data Analysis–and it works well, even for the more challenging 3-schools problem, where usual noninformative prior distributions don’t work so well.

Hierarchy of hierarchies

The half-Cauchy and its related distributions need some hyperparameters to be specified. The next step is to estimate these from data, by setting up a hyper-hyper-prior distribution on the multiple variance parameters that will exist in any hierarchical model. Actually, it’s sort of cool–I think it’s the next logical step to take in Bayesian Anova. We have an example in Section 6 of see this paper.

What comes next?

The next step is to come up with reasonable models for deep interactions (see here for some of our flailing in this area). Currently, the most challenging problems in multilevel models arise with sparse data with many possible levels of variance–and these are the settings where hierarchical hyper-hyper modeling of variance parameters should be possible. I think we’re on the right track, at least for the sorts of social-science problems we work on.

The other challenge is multivariate hierarchical modeling, for example that arise in varying-intercept, varying-slope models. Here I think the so-called Sam method has promise, but we’re still thinking about this.

One Comment

  1. Dan Navarro says:

    The pile of things to read just got a little deeper. Sigh. But in all seriousness, that sounds really cool. I've often used "noninformative" inverse Gammas as priors for variance parameters (and other things bounded below at zero), and been a little concerned about what happens in the limit as the parameters on the inverse Gamma go to 0. Definitely need to read this.