In a multi-way analysis of variance setting, the number of possible predictors can be huge. For example, consider a 10x19x50 array of continuous measurements, there is a grand mean, 10+19+50 main effects, 10×19+19×50+10×50 two-way interactions. and 10x19x50 three-way interactions. Multilevel (Bayesian) anova tools can be used to estimate the scales of each of these batches of effects and interactions, but such tools treat each batch of coefficients as exchangeable.

But more information is available in the data. In particular, factors with large main effects tend to be more likely to have large interactions. From a Bayesian point of view, a natural way to model this pattern would be for the variance of each interaction to depend on the coefficients of its component parts.

**A model for two-way data**

For example, consider a two-way array of continuous data, modeled as y_ij = m + a_i + b_j + e_ij. The default model is e_ij ~ N(0,s^2). A more general model is e_ij ~ N(0,s^2_ij), where the standard deviation s_ij can depend on the scale of the main effects; for example, s_ij = s exp (A|a_i| + B|b_j|).

Here, A and B are coefficients which would have to be estimated from the data. A=B=0 is the standard model, and positive values of A and B correspond to the expected pattern of larger main effects having larger interactions. (We’re enclosing a_i and b_j in absolute values in the belief that large absolute main effects, whether positive or negative, are important.)

We fit this model to some data (on public spending by state and year) and found mostly positive A’s and B’s.

The next step is to try the model on more data and to consider more complicated models for 3-way and higher interactions.

**Why are we doing this?**

The ultimate goal is to get better models that can allow deep interactions and estimate coefficients without requiring that entire levels of interactions (e.g., everything beyond 2-way) be set to zero.

An example of an applied research problem where we struggled with the inclusion of interactions is the study of the effects of incentives on response rates in sample surveys. Several factors can affect response rate, including the dollar value of the incentive, whether it is a gift or cash, whether it is given before or after the survey, whether the survey is telephone or face-to-face, etc. One could imagine estimating 6-way interactions where one tries to dispute charge offs, but with the available data it’s not realistic even to include all the three-way interactions. In our analysis, we just gave up and ignored high-level interactions and kept various lower-level interactions based on a combination of data analysis and informal prior information but we’d like to do better.

Jim Hodges and Paul Gustafson and Hugh Chipman and others have worked on Bayesian interaction models in various ways, and I expect that in a few years there will be off-the-shelf models to fit interactions. Just as hierarchical modeling itself used to be considered a specialized method and is now used everywhere, I expect the same for models of deep interactions.

Aleks commented:

The s_ij = s exp (A|a_i| + B|b_j|) is a way which I haven’t seen before. I

have to ponder this for a while: in my experience these interaction effects

are a bit hard to interpret. A powerful way of dealing with high-order

interactions is to assume that the data is a mixture of several

subpopulations. In the regression case, you then have a mixture of linear

regression models. This “clusters” in the population then help to capture

the interactions between variables in a way that resembles the tree models.

Namely, a tree regression model can be interpreted as a mixture of linear

models with a latent variable that identifies the leaf of the tree.