Allowing interaction terms to vary

Zoltan Fazekas writes:

I am a 2nd year graduate student in political science at the University of Vienna. In my empirical research I often employ multilevel modeling, and recently I came across a situation that kept me wondering for quite a while. As I did not find much on this in the literature and considering the topics that you work on and blog about, I figured I will try to contact you.

The situation is as follows: in a linear multilevel model, there are two important individual level predictors (x1 and x2) and a set of controls. Let us assume that there is a theoretically grounded argument suggesting that an interaction between x1 and x2 should be included in the model (x1 * x2). Both x1 and x2 are let to vary randomly across groups. Would this directly imply that the coefficient of the interaction should also be left to vary across country? This is even more burning if there is no specific hypothesis on the variance of the conditional effect across countries. And then if we add predictors on the second level for the x1 and x2 slopes, would these also be expected to influence the coefficient of the the interaction between x1 and x2? The last step refers to the situation in which x1 is modeled as a function of u1 on the second level, whereas x2 as a function of u2, making the assumption that x1 is independent from u2, and x2 from u1. If the coefficient of the x1*x2 interaction is let to vary, we would have 3 parameters varying. In this case, following the recommendation in your book with professor Hill, additional measures should be taken to model it. Also, I would expect that the covariance here is strikingly unusual.

I am a bit puzzled with this question, and — although easily doable — I wasn’t looking for an “empirical answer”, more something from statistical theory. I came across this situation in research carried out with a colleague where political information was the response variable, education and media exposure (for example) the two variables of interest. Second level were European countries.

My reply:

First off, except in unusual cases you should center (or approximately center) x1 and x2 before putting them in the model and letting their coefficients vary. Second, if these coefs vary, you should almost certainly allow the intercept in your regression vary. Third, you can certainly allow everything to vary. Fourth, it’s ok to let the main effects vary but have the interaction not vary. But if the interaction varies, really both main effects should vary. Finally, you’re gonna have a lot of variance parameters. If the number of groups is small, you might want to put in some prior information to stabilize your estimates. (This will be possible in bglmer.)

You ask for some theory. The key theoretical ideas are invariance (or approximate invariance) and efficiency. (a) If you don’t center the predictors, or if you allow the coefs for x1 and x2 to vary without letting the intercept to vary, your inferences are not even close to invariant to the selection of predictors in the model–even if the predictors are noise. (b) If you have lots of data, it’s better to let all coefs vary. If sample size is small, efficiency considerations can guide you into how much modeling it’s worth doing.

Fazekas then got back to me:

I did not mention it, but: I always center the variables (given some cross-group aims, mostly grand mean) and always run varying intercept and varying slopes model. When there is a predictor on the second level for a slope, I almost automatically put it as a predictor for the intercept. As far as I’ve seen, int lmer() this is the default, plus it was clear from your book that only in very specific cases should one run varying slope but fixed intercept models.

So this would have been the default setup:

(with variables grand mean centered)

y=b0+b1 x1+ b2 x2 + b3 x1*x3 + b controls + e
b0 = g0 + ee0
b1 = g1 + ee1
b2 = g2 + ee2

But in this case your fourth point is reassuring, because generally there are not that many second level units.

The extended specification would be, with predictors:

y=b0+b1 x1+ b2 x2 + b3 x1*x3 + b controls + e
b0 = g0 + g01 u1 + g02 u2 + ee0
b1 = g1 + g11 u1 + ee1
b2 = g2 + g21 u1 + ee2

From how I understood your answer, even in this scenario the same logic of efficiency would apply.