I was asked by the editors of the New Palgrave Dictionary of Economics (second edition) to contribute a short article on the analysis of variance. I don’t really know what economists are looking for here (the article is supposed to be aimed at the level of first-year graduate students), but I gave it a try. Here’s the article. Any comments (from economists or others) would be appreciated.

P.S. See here for revised version.

Econometrics is almost always taught in the context of regression. Econometricians are taught to test groups of variables through a joint test, and to test the value of pooling or keeping groups separate is tested by assessing the increase in explained sum of squares against the incresed degrees of freedom.

These have analogs in ANOVA, but most economists dismiss ANOVA relative to regression by arguing regression methods provide as much information plus allow economically-relevant interpretation of coefficients on individual variables.

Missing from the draft is a discussion of ANOVA versus regression, both in terms of conceptual links, and in practical terms what might be gained by using the ANOVA framework rather than a classic regression framework.

I second waldtest's comment. Regression is also felt to be less problematic with unbalanced samples. Goldberger's classic econometrics text Econometric Theory (1964) has a succint description (pp. 227-231) of ANOVA that finishes with a paragraph relating it to regression analysis.

I agree. In my Ph.D. Econ program, ANOVA was never mentioned even once. (In fact I can still never remember how one is to pronounce the term.) I finally figured out for myself that ANOVA is basically regression with different vocabulary and the focus on different things.

The last paragraph in Waldtest's comment describes exactly what economists to need to know: "ANOVA versus regression, both in terms of conceptual links, and in practical terms what might be gained by using the ANOVA framework."

I too agree with the above comments. Most first year econ grad students think of an ANOVA table as a regression output, so I think some people might be somewhat puzzled by the opening line of the intro: "ANOVA represents a set of models that can be fit to data…."

Also, a couple typos:

pg2, bullet 1, "5 airports" => "8 airports";

pg 5, top line, "SS, df1" => "SS2, df2";

And finally, not a typo, but for the life of me, I can't manage to say "highly statistically-significantly larger" (pg2, bullet5), without tripping all over myself. Is it just me?

I agree with the above comments. Here is one question you might address: When is ANOVA preferred to linear regression?

Most economists, myself included, tend to think that linear regression gives you all the information that ANOVA does. It seems like ANOVA is pretty close to linear regression with a dummy variable.

No need to study ANOVA for economists. It's a special case of Regression (Dummy Variable) — that's why economists don't study it. http://www.blackwell-synergy.com/doi/abs/10.1111/…

To the commenters above: the ways in which Anova is, and is not, a special case of classical regression is discussed in my Annals of Statistics paper. The short answer is that for some simple models, Anova is indeed a special case of regression, but for other models (notably those with hierarchical structure, such as split-plot designs), classical regression will _not_ give you the correct Anova results. Multilevel modeling will do it correctly, though, which is one reason I think of Anova as a way of structuring multilevel models. Anova ideas will take you further than you'll get by simply running regressions with dummy variables.