David Hsu writes:

I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model?

The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about.

However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different.

For example, using JAGS, and then visualizing the predictors relative to zero (i.e., the null hypothesis) using a plot similar to your ANOVA graphs (figure 22.2, 22.3), I would find that if I made the error bars either based on 95% confidence intervals or +/- 2 standard deviations, one would conclude that the predictors are not very significant (since 2.5% and 97.5% limits span zero).

But if I use the lm() function to check the model without any varying intercepts, I get all of the predictors significant. It is based on 12,000 or so observations, so I guess I’d expect the standard errors to be low. But by the same token, I’d expect the standard deviation of the chains for each estimate to be equivalently low and asymptotically approaching the standard errors from the normal OLS. lmer() gives me coef.se that are somewhere in between.

My reply: First off, I’d recommend Stan rather than Bugs/Jags. Second, it can help to include informative priors. Even weak prior information (for example, half-Cauchy priors that bound the parameters away from unrealistically high values) can be useful in constraining group-level variance parameters (especially when the number of groups is small). Third, if you fit lm(), you’ll tend to get standard errors that are too small because you’re not incorporating the correlations in the unexplained errors.

Picking up on Andy’s last point the OLS-based (that is, lm()-type) analogue should probably apply a multiway cluster robust standard error estimator, a la Cameron, Gelbach and Miller (2011).