Skip to content


Jake Bowers sent me a paper he and Katherine Drake wrote on exploratory data analysis for multilevel models.

My comments

The paper begins with a gradual justification of the use of multilevel models and then discusses a specific example (a regression of voting participation on education) where it could be interesting to allow slopes to vary by state, with state-level predictors.

The paper is interesting, with a variety of pretty pictures. I have a couple of technical comments. First, I like that they pull out the state-level predictors and put them in Table 2. The individual-level dataset in Table 1 can then have state indicators without the state-level predictors. (This is an issue we disucss further in Chapter 7 of our forthcoming book on regression and multilevel models.)

Second, the paper has some useful discussion of correlations in varying intercepts and slopes. But much of the correlation in this example is a statistical artifact arising from the fact that the education predictor is far from 0. I’d suggest pre-processing the “years of education” by subtracting 12 before you start. This also gives you a direct interpretation of the interecepts.

The pictures are interesting, but some tell more than others. In particular, Figure 2 seems pretty useless. For one thing, the intercepts don’t mean much since there’s almost nobody in the data with 0 education. (Actually, maybe those people with education less than 8 years should be moved up to 8–putting them all in the “no high school” category.)

Figure 3 is nice. But what is the ordering of the states? Perhaps stated in the text but should be in the figure caption. Figure 4 is nice, and at this point I’d say: just fit the multilevel model and start displaying some inferences from that. Why bother with the noisy least-squares estimates? Similar, Figure 5 is a mess because the super-noisy data set the scale and obscure all the interesting patterns.

Figure 7 just seems really silly to me. At this point, you’re developing a lot of theory to work with these noisy least-squares regression coefficients. I’d fit the multilevel model first, and then see to what extent the model is not fitting. Or maybe I’m misunderstanding what they’re doing (if so, I apologize).

In summary, I like the idea of making graphs but in this case I think I’d rather start by fitting a reasonable multilevel model and then making some graphs from there–first some graphs to summarize the estimates, then some EDA graphs to check model fit and learn more.

Finally, I really like how they use informative x-axes on the plots. I hate it when people plot things in alphabetical order or using id numbers.

(In case you missed it above, here’s a link to the Bowers and Drake paper.)


  1. Jake says:

    Thanks very much for the comments! I'm posting now mostly to

    agree, but also to ask some additional questions and to clarify what I

    hope readers will learn from the paper (once suitably revised).

    Our motivation to write this article came from seeing colleagues

    around the country present multilevel models where attributes of

    between 5 and 20 countries explained the behavior of around 1000

    people surveyed per country. In such cases, it seems like simple

    graphical displays of within-country fits would help these folks

    answer their substantive questions more effectively than significance

    tests based on dubious assumptions. Plus, if people want to go on to

    specify and estimate a multilevel model, it seems useful to make sure

    that they know they are modeling the slopes of the within-country fits

    and using a particular distributional assumption to make this model


    I completely agree that in the example we presented in this draft,

    the regressions are really noisy and probably should be seen as an

    opportunity to use the pooling power of a multilevel model rather than

    as a source of much information about those units. Given your

    comments, I think we'll revise this paper to use one of the

    cross-national public opinion datasets so that our within-country

    regressions will be less noisy and so our analyses will be more

    intellectually interesting to the most important part of our intended

    audience (i.e. researchers interested in how country-level attributes

    affect individual level behavior).

    I also agree about the need to center the individual level explanatory

    variable. I was conflicted about whether or not to somehow center the

    education variable (as you can see in our footnote 14 on page 17) and

    after seeing your comments think that this is something we should


    About just specifying a model and then checking it: I'm afraid to

    recommend that people model first and check later. Most of them are

    not aware of the assumptions that are built into their software. And,

    I suspect by focusing on single coefficients in additive linear

    models, they are smoothing over interesting features of these datasets

    where the number of level-2 units is small. If they are convinced that

    a multilevel model is the right thing to do, then of course they

    should estimate one and do model checking. But, I worry that people

    are not getting to know their data well enough before specifying their

    models. And, I don't want to recommend endless rounds of model

    checking for fear of wasting peoples' time and also for fear of what

    their p-values will mean in the end (again, assuming that they'll

    mostly be using likelihood-based software and approaches).

    Also, the audiences for such models are not currently very

    sophisticated when it comes to statistics. Thus, graphical displays or

    simple tables might be more rhetorically powerful, as well as better

    at illuminating the particular dataset under scrutiny, than models

    that might seem to them a bit like black boxes.

    For these reasons, I would really like to recommend and show

    people what to do before they specify a model. And, once

    they know that they are modeling the within-country slopes, and are

    happy with their assumptions, then they should estimate the model, and

    check their assumptions and fit. I just read Rubin's article in the

    March 2005 JASA and found one line that helped me think more clearly

    about what I'd like to help people do with this paper. He says, "It is

    the scientific quality of those assumptions, not their existence, that

    is critical." (324). I'd like to help people interested in cool

    questions about institutions and human behavior to (1) understand what

    it is that their models are actually doing and (2) assess the

    scientific quality of their assumptions using their data. I think part

    of goal (2) is taken care of by your work on EDA and by others who

    have taught us about influence, leverage, and other diagnostics for

    multilevel models. But, I think the part that happens before the model

    is a bit hazy right now, and I think that part has the potential to

    reveal a lot about the phenomenon under study, above and beyond the

    model specification guidance it can provide. Does this make


    I'm sure that the thoughts we present in our paper are not the final

    word on this. But, I figured it was worth getting something out there

    for folks to chew on and respond to, in any case.

    Thanks again for your comments!

  2. Andrew says:


    One way of thinking of what you're doing (fitting separate regressions within each group) is as an approximation to multilevel modeling–an approximation that works well for groups with lots of data. It's related to the method we call "the secret weapon" for time series cross-sectional data.

    So, yeah, what you're doing seems reasonable. I'd like to think that, soon enough, mlms will be easy enough to implement computationally that it'll be just as easy to do a mlm as to do what you're doing. (Just as, nowadays, multiple regression is so easy that we don't feel the need to compute a bunch of partial correlations as a buildup.)

    In any case, the same techniques you've used can also be used to display estimates from the mlm (if you choose to fit one). In that case, I'd favor plotting single posterior draws (i.e., random imputations) rather than posterior means or medians, which are subject to inferential artifacts (see my paper "All Maps of Parameter Estimates are Misleading" and also various papers of Tom Louis).

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.