Something came up in comments today that I’d like to follow up on.

In our earlier post, I brought up an example:

If you’re modeling attitudes about gun control, think hard about what state-level predictors to include. My colleagues and I thought about this a bunch of years ago when doing MRP for gun-control attitudes. Two natural state-level predictors are Republican vote share and percent rural. These variables are also highly correlated. But look at Vermont: it’s one of the most Democratic states and also the most rural. Vermont also is a small state, so the MRP inference for Vermont will depend strongly on the fitted model, which in turn will depend strongly on the coefficients for R vote and %rural. I think you’ll need some strong priors here to get a stable answer. Default flat priors could mess you up. You might not realize the problem when fitting to one particular dataset, but you’ll be getting a really noisy answer.

To elaborate, consider the following 4 MRP models:

1. No state-level predictors. This is bad because for states without much data, estimates are pooled toward the national mean. This is clearly the wrong thing to do for Montana, say, as Montana is a small state where attitudes toward gun control are probably very far from the national average.

2. Republican vote share as a state-level predictor. Now the estimates are pooled toward the state-level regression model, which will estimate a negative effect of state-level Republican vote on gun control attitude. This will do the right thing for Montana. This model will partially pool Vermont to other strongly Democratic states such as California, Maryland, and Hawaii.

3. Percent rural as a state-level predictor. This should do ok also, with voters in more rural states being less likely to support gun control, and it should also do the right thing for Montana. This model will partially pool Vermont to other rural states such as Montana, North Dakota, and Mississippi.

4. Include Republican vote share and percent rural as two state-level predictors. This will do the right thing again for Montana and will split the difference on Vermont. The estimate for Vermont should also have a higher uncertainty, reflecting inferential uncertainty about the relative importance of the two state-level predictors.

Of all of these options, I think #4 is the best—but this presupposes that I’m willing to use strong priors to control the estimation. If I’m only allowed to use weak priors—or, worse, not allowed to use any priors at all—then #4 could give very noisy results.

This is where Bayes comes in. It’s not just that Bayes allows the use of prior information. It’s also that, by allowing the use of prior information, Bayes also opens the door to including more information into the model in the form of predictors.

To put it another way, models 1, 2, and 3 above are all special cases of model 4, but with very strong priors where one or two of the coefficients are assumed to be exactly zero. Paradoxically, putting Bayesian priors (or, more generally, some regularization) in model 4 allows us to fit a bigger, more general model than would otherwise be realistically possible.

I don’t have anything useful to say/ask here but I do want to express that these breakdowns of the model building process are very helpful to us newbies, or at least to this one!

Michael:

Thanks for letting me know. It means a lot to hear that people are actually reading these posts!

“To put it another way, models 1, 2, and 3 above are all special cases of model 4, but with very strong priors where one or two of the coefficients are assumed to be exactly zero.”

I feel like I need to put this on a sticky note on my monitor. I think I’ve read a similar statement either here or in ROS and every time I read it I smack myself in the head and say DUH.

So if I keep it up there I’ll either stop smacking myself in the head because I won’t forget or smack myself constantly. I’m putting my money on the former.

Thanks for cogent breakdowns!

“To put it another way, models 1, 2, and 3 above are all special cases of model 4, but with very strong priors where one or two of the coefficients are assumed to be exactly zero. Paradoxically, putting Bayesian priors (or, more generally, some regularization) in model 4 allows us to fit a bigger, more general model than would otherwise be realistically possible.”

This is great! Not many comments on this post, but this is a very helpful post. Having had to teach myself Bayesian modeling, which I use every day now, it took me a while to realize this!

This is a very helpful example, thank you.

Would you also apply this sort of thinking to the sigma component of a model. For example, is it similarly practical to imagine that, say for a standard gaussian glm, the model where sigma is constant is just a special case of the model where sigma is allowed to vary by all relevant groupings (e.g., same as imposed on mu)? As a hypothetical default, should we have any more reason to believe that sigma is exactly the same for every grouping than we should with mu?

I ask this because, with the advent of brms, I find myself fiddling with “distributional” models quite often and often wonder if I’m going to end up over-fitting too many models. I mean, I typically have good subject matter reasons to fiddle, but generally should my paranoia in this regard be more or less even with my paranoia towards flexible models for mu?