EDA for HLM

The paper begins with a gradual justification of the use of multilevel models and then discusses a specific example (a regression of voting participation on education) where it could be interesting to allow slopes to vary by state, with state-level predictors.

The paper is interesting, with a variety of pretty pictures. I have a couple of technical comments. First, I like that they pull out the state-level predictors and put them in Table 2. The individual-level dataset in Table 1 can then have state indicators without the state-level predictors. (This is an issue we disucss further in Chapter 7 of our forthcoming book on regression and multilevel models.)

Second, the paper has some useful discussion of correlations in varying intercepts and slopes. But much of the correlation in this example is a statistical artifact arising from the fact that the education predictor is far from 0. I’d suggest pre-processing the “years of education” by subtracting 12 before you start. This also gives you a direct interpretation of the interecepts.

The pictures are interesting, but some tell more than others. In particular, Figure 2 seems pretty useless. For one thing, the intercepts don’t mean much since there’s almost nobody in the data with 0 education. (Actually, maybe those people with education less than 8 years should be moved up to 8–putting them all in the “no high school” category.)

Figure 3 is nice. But what is the ordering of the states? Perhaps stated in the text but should be in the figure caption. Figure 4 is nice, and at this point I’d say: just fit the multilevel model and start displaying some inferences from that. Why bother with the noisy least-squares estimates? Similar, Figure 5 is a mess because the super-noisy data set the scale and obscure all the interesting patterns.

Figure 7 just seems really silly to me. At this point, you’re developing a lot of theory to work with these noisy least-squares regression coefficients. I’d fit the multilevel model first, and then see to what extent the model is not fitting. Or maybe I’m misunderstanding what they’re doing (if so, I apologize).

In summary, I like the idea of making graphs but in this case I think I’d rather start by fitting a reasonable multilevel model and then making some graphs from there–first some graphs to summarize the estimates, then some EDA graphs to check model fit and learn more.

Finally, I really like how they use informative x-axes on the plots. I hate it when people plot things in alphabetical order or using id numbers.

(In case you missed it above, here’s a link to the Bowers and Drake paper.)

1. Jake says:

Thanks very much for the comments! I'm posting now mostly to

agree, but also to ask some additional questions and to clarify what I

hope readers will learn from the paper (once suitably revised).

around the country present multilevel models where attributes of

between 5 and 20 countries explained the behavior of around 1000

people surveyed per country. In such cases, it seems like simple

graphical displays of within-country fits would help these folks

answer their substantive questions more effectively than significance

tests based on dubious assumptions. Plus, if people want to go on to

specify and estimate a multilevel model, it seems useful to make sure

that they know they are modeling the slopes of the within-country fits

and using a particular distributional assumption to make this model

work.

I completely agree that in the example we presented in this draft,

the regressions are really noisy and probably should be seen as an

opportunity to use the pooling power of a multilevel model rather than

as a source of much information about those units. Given your

comments, I think we'll revise this paper to use one of the

cross-national public opinion datasets so that our within-country

regressions will be less noisy and so our analyses will be more

intellectually interesting to the most important part of our intended

audience (i.e. researchers interested in how country-level attributes

affect individual level behavior).

I also agree about the need to center the individual level explanatory

variable. I was conflicted about whether or not to somehow center the

education variable (as you can see in our footnote 14 on page 17) and

do.

About just specifying a model and then checking it: I'm afraid to

recommend that people model first and check later. Most of them are

not aware of the assumptions that are built into their software. And,

I suspect by focusing on single coefficients in additive linear

models, they are smoothing over interesting features of these datasets

where the number of level-2 units is small. If they are convinced that

a multilevel model is the right thing to do, then of course they

should estimate one and do model checking. But, I worry that people

are not getting to know their data well enough before specifying their

models. And, I don't want to recommend endless rounds of model

checking for fear of wasting peoples' time and also for fear of what

their p-values will mean in the end (again, assuming that they'll

mostly be using likelihood-based software and approaches).

Also, the audiences for such models are not currently very

sophisticated when it comes to statistics. Thus, graphical displays or

simple tables might be more rhetorically powerful, as well as better

at illuminating the particular dataset under scrutiny, than models

that might seem to them a bit like black boxes.

For these reasons, I would really like to recommend and show

people what to do before they specify a model. And, once

they know that they are modeling the within-country slopes, and are

happy with their assumptions, then they should estimate the model, and

check their assumptions and fit. I just read Rubin's article in the

March 2005 JASA and found one line that helped me think more clearly

about what I'd like to help people do with this paper. He says, "It is

the scientific quality of those assumptions, not their existence, that

is critical." (324). I'd like to help people interested in cool

questions about institutions and human behavior to (1) understand what

it is that their models are actually doing and (2) assess the

scientific quality of their assumptions using their data. I think part

of goal (2) is taken care of by your work on EDA and by others who

have taught us about influence, leverage, and other diagnostics for

multilevel models. But, I think the part that happens before the model

is a bit hazy right now, and I think that part has the potential to

reveal a lot about the phenomenon under study, above and beyond the

model specification guidance it can provide. Does this make

sense?

I'm sure that the thoughts we present in our paper are not the final

word on this. But, I figured it was worth getting something out there

for folks to chew on and respond to, in any case.