Finite-population standard deviation in a hierarchical model

Karri Seppa writes:

My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts.

Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects?

My reply:

I agree that these points can be confusing, as can be seen by the 5 different definitions of fixed/random effects that I discuss in the Anova paper. There is simply no way of making everybody happy!

Here’s what I would say: Your goal is to estimate what’s going on in these 21 districts. To the extent there is a “true” superpopulation, it could be thought of as representing variation over time as well as space. But, mathematically, the superpop and the associated normal (or whatever) distribution can be viewed as a tool for getting statistically efficient estimates for the 21 districts that you have. Now that you have simultaneously estimated parameters for these 21 districts, you might also be interested in ensemble properties, for example the maximum, minimum, interquartile range, or even–gasp–standard deviation of these 21 numbers. It’s well known that no point estimate in high-dimensional space can capture ensemble properties–the key paper here is a 1984 article by Tom Louis, which is referred to in one of my books (BDA or ARM). I guess what I’m saying is that you make it clear that your goal is the 21 districts and that the Bayesian inference and superpop is a tool for getting there.

Here’s their paper.

2 thoughts on “Finite-population standard deviation in a hierarchical model

  1. A related issue is how to display plots of the estimates of individual-level parameters, {theta_i}, of the finite-population in a hierarchical model. For example, a histogram of the marginal posterior means E(theta_i | y) would be underdispersed with respect to the true distribution, if partial pooling is significant. In this case, it intuitively makes more sense to display the posterior mean of the histogram rather than the histogram of posterior means.

    But what if the theta_i are 2-vectors and you want to make a bivariate scatter plot showing the finite population distribution of the paired parameter estimates. If shrinkage is important, then the scatter plot of the marginal posterior means E(theta_i | y) is also underdispersed. But it is not clear how to make the posterior mean of the scatter plot rather than the scatter plot of posterior means. Any thoughts?

    An alternative is to show several scatter plots showing snapshots of the MCMC, i.e. joint draws from the full posterior P(theta | y). This would be similar to the multiply-imputed maps discussed by Gelman & Price (1999). But this would be a cumbersome solution.

Comments are closed.