Somewhat Bayesian multilevel modeling

Eric McGhee writes:

I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available.

I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions:

First, what are the costs of using this approach as opposed to full Bayesian?

Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with the coefficients and skip the step of random draws from a binomial distribution (i.e., use the technique described at the bottom of p. 148), I get results that are much more sensible (around +/- 5 points). Do the random draws from the binomial distribution only apply to out-of-sample predictions? Or do they apply to in-sample predictions, too? If the latter, any idea why I would be getting such a large range of results? Might that be signaling something wrong with the model, or with my R code?

Finally, when dealing with simulation results, what would most closely correspond to a margin of error? The 5%-95% interval mentioned above? Or something else? I need a way of summarizing uncertainty using a terminology that is familiar to a policy audience.

My reply:

The main benefit of full Bayes over approximate Bayes (of the sort done by lmer(), for example, and used in many of the examples in my book with Jennifer) arises when group-level variances are small. Approximate Bayes gives a point estimate of the variance parameters, which understates uncertainty compared to full Bayes. We are currently working on an add-on to lmer()-like programs to include some of that uncertainty, but we haven’t done it yet, so I don’t have any R package to conveniently offer you here.

Regarding your simulation question: Yes, if you’re interested in estimating all of California, you don’t want to do that binomial simulation–that’s something you only do when you’re simulating some finite amount of new data.

For the margin of error, you can just compute sd’s from the simulations and then compute 2*sd. Or you can use the [2.5%, 97.5%] simulation points, but that will be pretty noisy unless you have thousands of simulations.