Jake Bowers sent me a paper he and Katherine Drake wrote on exploratory data analysis for multilevel models.

**My comments**

The paper begins with a gradual justification of the use of multilevel models and then discusses a specific example (a regression of voting participation on education) where it could be interesting to allow slopes to vary by state, with state-level predictors.

The paper is interesting, with a variety of pretty pictures. I have a couple of technical comments. First, I like that they pull out the state-level predictors and put them in Table 2. The individual-level dataset in Table 1 can then have state indicators without the state-level predictors. (This is an issue we disucss further in Chapter 7 of our forthcoming book on regression and multilevel models.)

Second, the paper has some useful discussion of correlations in varying intercepts and slopes. But much of the correlation in this example is a statistical artifact arising from the fact that the education predictor is far from 0. I’d suggest pre-processing the “years of education” by subtracting 12 before you start. This also gives you a direct interpretation of the interecepts.

The pictures are interesting, but some tell more than others. In particular, Figure 2 seems pretty useless. For one thing, the intercepts don’t mean much since there’s almost nobody in the data with 0 education. (Actually, maybe those people with education less than 8 years should be moved up to 8–putting them all in the “no high school” category.)

Figure 3 is nice. But what is the ordering of the states? Perhaps stated in the text but should be in the figure caption. Figure 4 is nice, and at this point I’d say: just fit the multilevel model and start displaying some inferences from that. Why bother with the noisy least-squares estimates? Similar, Figure 5 is a mess because the super-noisy data set the scale and obscure all the interesting patterns.

Figure 7 just seems really silly to me. At this point, you’re developing a lot of theory to work with these noisy least-squares regression coefficients. I’d fit the multilevel model first, and then see to what extent the model is not fitting. Or maybe I’m misunderstanding what they’re doing (if so, I apologize).

In summary, I like the idea of making graphs but in this case I think I’d rather start by fitting a reasonable multilevel model and then making some graphs from there–first some graphs to summarize the estimates, then some EDA graphs to check model fit and learn more.

Finally, I really like how they use informative x-axes on the plots. I hate it when people plot things in alphabetical order or using id numbers.

(In case you missed it above, here’s a link to the Bowers and Drake paper.)

Thanks very much for the comments! I'm posting now mostly to

agree, but also to ask some additional questions and to clarify what I

hope readers will learn from the paper (once suitably revised).

Our motivation to write this article came from seeing colleagues

around the country present multilevel models where attributes of

between 5 and 20 countries explained the behavior of around 1000

people surveyed per country. In such cases, it seems like simple

graphical displays of within-country fits would help these folks

answer their substantive questions more effectively than significance

tests based on dubious assumptions. Plus, if people want to go on to

specify and estimate a multilevel model, it seems useful to make sure

that they know they are modeling the slopes of the within-country fits

and using a particular distributional assumption to make this model

work.

I completely agree that in the example we presented in this draft,

the regressions are really noisy and probably should be seen as an

opportunity to use the pooling power of a multilevel model rather than

as a source of much information about those units. Given your

comments, I think we'll revise this paper to use one of the

cross-national public opinion datasets so that our within-country

regressions will be less noisy and so our analyses will be more

intellectually interesting to the most important part of our intended

audience (i.e. researchers interested in how country-level attributes

affect individual level behavior).

I also agree about the need to center the individual level explanatory

variable. I was conflicted about whether or not to somehow center the

education variable (as you can see in our footnote 14 on page 17) and

after seeing your comments think that this is something we should

do.

About just specifying a model and then checking it: I'm afraid to

recommend that people model first and check later. Most of them are

not aware of the assumptions that are built into their software. And,

I suspect by focusing on single coefficients in additive linear

models, they are smoothing over interesting features of these datasets

where the number of level-2 units is small. If they are convinced that

a multilevel model is the right thing to do, then of course they

should estimate one and do model checking. But, I worry that people

are not getting to know their data well enough before specifying their

models. And, I don't want to recommend endless rounds of model

checking for fear of wasting peoples' time and also for fear of what

their p-values will mean in the end (again, assuming that they'll

mostly be using likelihood-based software and approaches).

Also, the audiences for such models are not currently very

sophisticated when it comes to statistics. Thus, graphical displays or

simple tables might be more rhetorically powerful, as well as better

at illuminating the particular dataset under scrutiny, than models

that might seem to them a bit like black boxes.

For these reasons, I would really like to recommend and show

people what to do

beforethey specify a model. And, oncethey know that they are modeling the within-country slopes, and are

happy with their assumptions, then they should estimate the model, and

check their assumptions and fit. I just read Rubin's article in the

March 2005 JASA and found one line that helped me think more clearly

about what I'd like to help people do with this paper. He says, "It is

the scientific quality of those assumptions, not their existence, that

is critical." (324). I'd like to help people interested in cool

questions about institutions and human behavior to (1) understand what

it is that their models are actually doing and (2) assess the

scientific quality of their assumptions using their data. I think part

of goal (2) is taken care of by your work on EDA and by others who

have taught us about influence, leverage, and other diagnostics for

multilevel models. But, I think the part that happens before the model

is a bit hazy right now, and I think that part has the potential to

reveal a lot about the phenomenon under study, above and beyond the

model specification guidance it can provide. Does this make

sense?

I'm sure that the thoughts we present in our paper are not the final

word on this. But, I figured it was worth getting something out there

for folks to chew on and respond to, in any case.

Thanks again for your comments!

Jake,

One way of thinking of what you're doing (fitting separate regressions within each group) is as an approximation to multilevel modeling–an approximation that works well for groups with lots of data. It's related to the method we call "the secret weapon" for time series cross-sectional data.

So, yeah, what you're doing seems reasonable. I'd like to think that, soon enough, mlms will be easy enough to implement computationally that it'll be just as easy to do a mlm as to do what you're doing. (Just as, nowadays, multiple regression is so easy that we don't feel the need to compute a bunch of partial correlations as a buildup.)

In any case, the same techniques you've used can also be used to display estimates from the mlm (if you choose to fit one). In that case, I'd favor plotting single posterior draws (i.e., random imputations) rather than posterior means or medians, which are subject to inferential artifacts (see my paper "All Maps of Parameter Estimates are Misleading" and also various papers of Tom Louis).