We all know to look at main effects first and then look for interactions. But a former student pointed me to some disturbing advice from some statistics textbooks. I’ll give his quotes and then my reactions:

Introductory Statistics Textbooks

There are a number of introductory textbooks that advise the students to test the interaction first in a two-way ANOVA with interaction. But I [Shyue-Ming] just give one quote, i.e., from the business statistics version of Moore and McCabe.

Moore, McCabe, Duckworth, and Sclove (2004) says the following on p. 15-17: “There are three null hypotheses in two-way ANOVA, with an F test for each. We can test for significance of the main effect of A, the main effect of B, and the AB interaction. It is generally good practice to examine the test interaction first, since the presence of a strong interaction may influence the interpretation of the main effects.”

Higher-level Books

I [Shyue-Ming] also include some quotes from higher-level books.

Chow (2003) on p. 1027 says: “To test the treatment effect for the two-way ANOVA model with interaction, the interaction effect has to be tested first. Otherwise, the result for testing the treatment effect cannot be interpreted in a statistically meaningful manner.”

Muller and Fetterman (2002) say the following on p. 381: “The interaction represents substantial investment in factorial design. Confidence in the absence of an interaction of consequence would allow conducting two much smaller studies, each with a one-way design, or a purposefully incomplete (and therefore much smaller) two-way design. As always, go ‘backwards’ by testing the interaction first. Model reduction (pooling) in the presence of nonsignificant interaction may be attractive for unbalanced or incomplete designs.”

Cabrera and McDougall (2002) say the following on p. 111: “As in the one-way case, a large F-value provides evidence against the null hypothesis for the corresponding effect. However, the AB interaction test should always be examined first. The reason for this is that there is little point in testing HA or HB if HAB: no interaction effect is rejected, since the difference between any two levels of a main effect also includes an ‘average’ interaction effect. In other words, regardless of the test results for HA and HB, ‘both’ treatment factors are important if there is a significant interaction effect. (How could the interaction exist otherwise?)”

Bibliography

Cabrera and McDougall (2002). Statistical Consulting. Springer.

Chow, Shein-Chung (2003). Encyclopedia of Biopharmaceutical Statistics. Informa Health Care.

Moore, McCabe, Duckworth, and Sclove (2004). Practice of Business Statistics, Part IV (Chapters 12-18). Palgrave Macmillan.

Muller, Keith and Bethel Fetterman (2002). Regression and ANOVA: An Integrated Approach Using SAS Software. SAS Publishing.

My reaction:

OK, I think I see what’s going on. Generally, it’s good advice to include main effects first and then interactions. So, if you’re taking things *away* from the model, you should remove–i.e., “test for”–interactions first. That said, I think that the above advice is confusing and the framing in terms of null hypotheses is unhelpful. I’d rather talk about building up the model, starting with main effects and then adding interactions as appropriate.

A related point is that, in general, including an interaction changes the interpretation of coefficients for main effects. That’s something you have to be careful with, and Jennifer and I discuss it in chapters 3, 4, and 5 of our book.

I had a fairly traditional ANOVA class in grad school (in psychology), where we were told look at the interactions first. But it was "look at," not "test." Notwithstanding the Muller & Fetterman quote about simplifying your models, the way ANOVA is often done in experimental psychology (and I wouldn't be surprised to hear in other experimental disciplines), there is little or no process of model-building — if you are analyzing an experiment with a factorial design, then all of the interactions are always in the model. When the interactions are not significant, many people will nevertheless report the main effects from a model that contains interactions. So in some of those quotes, "testing" the interactions first is probably a sloppy shorthand for "look at" the interactions first. Underneath the hood, the interactions are always being tested in a model that also contains the main effects (and vice versa).

When I later took classes more focused on regression, SEM, MLM, etc., it was completely different. There the approach was much more about viewing data analysis as a process of model-building, and you don't automatically include every possible interaction between every combination of variables just because you could.

Some of these differences, I think, have to do with who was traditionally likely to use which analysis. In psych, ANOVA was (and still is) typically used by experimenters — and why would you run a factorial experiment unless you thought there might be interactions? Regression, by contrast, was traditionally used by people who do observational research. And if you have lots and lots of measured input variables, why would you include a bunch of interactions that you didn't predict, that you'd have no interpretation for, and that'll probably blow up your model?

Possibly worse is no advice at all. I'm teaching business stat this semester and Monday is the 3 hour ANOVA lecture. It will not make YouTube.

Anderson Sweeney & Williams, Statistics for Business and Economics, 10th ed 2008 spends only pages 521-526 on factorial designs out of 1018 pages. The book basically tells you that you can answer 3 questions (2 main effects and 1 interaction), but doesn't even suggest you lay out the means in a row * column table, or graph them.

ANOVA's just skimmed over lightly in favor of regression (pages 543-743). Maybe I should just skip ANOVA entirely next time.

I remember that in my ANOVA class, we were taught to look at interactions first, exactly because if interactions were present it would change the interpretation of the main effects. Of course, you would never test a model with interactions that wouldn't include the main effects present in the interaction (to test interaction AB, both effects A and B have to be in the model, but they do not have to be significant).

Another reason we were taught to look at interactions first, I think, is an example where, if we were to look at main effects first, we would find effect A is significant while effect B isn't. One might then decide to remove B from the model (though that may not be everyone's first instinct). However, it could be possible that the interaction AB is in fact significant but we would never test it because we would discard B in the first place.

We were also taught, though, that not all interactions should be tested. For example, we should prioritize two-way interactions over three-way or four-way interactions (that's mostly a question of preserving degrees of freedom though).

However, in regression analysis where we are building a model, interactions come in only once we have weeded through main effects first since then we are simply trying to build the best predictive model. Maybe that isn't always the best strategy though.

I'd be interested in knowing what the arguments are for looking at main effects first and if there are ever disadvantages to doing so. The discussion interests me.

As various people have noticed the context is important. In experimental design and statistics classes for psychologists I teach them to look at interactions first in ANOVA. However, we're talking about orthogonal designs where the interactions and main effects are independent (which is part of the strict definition of ANOVA). All the tests are in practice simultaneous and it is very rare to run models sans interaction terms (typically only when there are insufficient df).

It doesn't make sense to 'test' the ANOVA first in this context – though I note that the Moore et al. text apparently says "generally good practice to examine the test interaction first" which suggests to me it should read "generally good practice to examine the test [of] interaction first" or similar. This doesn't seem to imply testing the interaction first.

You'll be happy to hear that my ANOVA professor made a point of saying that you should not interpreting the interaction first, as was recommended in our text (Keppel and Wickens. Design and Analysis).

I'm still struggling with this. (I teach statistics

to ecologists; I have a lot of experience and not

very much formal training in statistics.) Many of

these data sets that students work with are fairly

small, noisy, and painstakingly gathered.

I was going to write a bunch of other stuff, but

reading the comments above from Sanjay and

Pierre-Hugues I guess I'll just say "me too" —

I would have concerns that one might miss

important stuff by concluding (because of strong

interactions) that the main effects were just

a mess and there wasn't really anything interesting

going on in the data …

I think much of the difference in the model-building regression approach and the factorial anova approach isn't so much regression and anova as testing a hypothesized interaction vs. checking if there might be an interaction.

In most of those psychology experiments, the researchers are hypothesizing interactions. As Sanjay points out, you set up a factorial design to test the interaction. I started out in psychology research and in my experience no one EVER set up a study that hypothesized main effects only. Those just weren't interesting hypotheses.

But even in a regression, especially in a designed study, as opposed to secondary data analysis, some main effects just aren't meaningful without the interaction. I'm thinking for example, of a study looking at the factors that affect number of hours per week that someone works. There is no point in including age of youngest child without the interaction with gender. This is a case where a predictor has a huge effect on women, but not men. If you are hypothesizing an interaction, why leave it out until the end? The only reason I can think of is to show how much better the model fits with the interaction, and once you include the interaction, the coefficient of the component terms aren't main effects.