Graphical tools for understanding multilevel models

There are a few things I want to do:

1. Understand a fitted model using tools such as average predictive comparisons, R-squared, and partial pooling factors. In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to):

– Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons);

– Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data);

– Defining an R-squared for each level of a multilevel model.

The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables.

So now we want to implement these in R and put them into arm along with bglmer etc.

2. Setting up coefplot so it works more generally (that is, so the graphics look nice for models with one predictor, two predictors, or twenty predictors). Also a bunch of expansions to coefplot:

– Defining coefplot for multilevel models

– Also displaying average predictive comparisons for nonlinear models

– Setting it up to automatically display several regressions in a large “table”

3. Automatic plots showing data and fitted regression lines/curves. With multiple inputs, you hold all the inputs but one to fixed values–it’s sort of like an average predictive comparison, but graphical. We also have to handle interactions and multilevel models.

4. Generalizing R-squared and partial pooling factors for multivariate (varying-intercept, varying-slope) models.

5. Graphs showing what happens as you add a multilevel component to a model. This is something I’ve been thinking about for awhile, ever since doing the police stop and frisk model with Jeff Fagan and Alex Kiss. I wanted a graph that showed how the key estimates were changing when we went multilevel, and what in the data was making the change.

One reason to see why these sort of process explanations are important is . . . we give them all the time. We’re always giving these data-and-model stories of why when we control for variable X, our estimate changes on variable Y. Or why our multilevel estimates are a compromise between something and something else. What I’d like to do is to formalize and automate these explanations. Just as we have formulated and automated much of inference and we have (to some extent) formulated and automated graphical model checking, I’d like to formulate and automate the steps of understanding a fitted model–which often requires understanding in the context of fitted alternatives that are close by in model space.

6. I have more ideas but this is enough for now.

7 thoughts on “Graphical tools for understanding multilevel models

  1. Thanks for linking to your JASA "Stop-and-Frisk" paper. That's a nice piece of data analysis! It must have been quite a chore to collect and prepare the data for analysis.

    I'm curious why "Percentage of blacks in precinct" wasn't included as a covariate in the regression analysis? Figs. 2-3 display the regression results for various "partitioning [of] the precincts into different numbers of categories" (p. 818). Perhaps that was done so that the results could be more easily explained to police and politicians?

  2. Great papers. I was wondering if you have some advice about computing the pooling factor for multilevel logistic regression.

    Thanks
    Manoel

  3. I have done a moderate amount of work extending coefplot (I've been calling my version coefplot2), although not necessarily in the same directions you're interested in — I've wanted it to handle objects from as large a range of R's modeling tools as possible — e.g. MCMCglmm, lme4, nlme, etc.. Are you interested in having a look? (I've got a project page for it on R-forge but haven't gotten around to setting it up yet.)

  4. I just uploaded "coefplot2" to R-forge; it may take 24 hours for packages to be available for download, but you can get the SVN tree now.

  5. #3 stirs some thoughts.
    "3. Automatic plots showing data and fitted regression lines/curves. With multiple inputs, you hold all the inputs but one to fixed values–it's sort of like an average predictive comparison, but graphical. We also have to handle interactions and multilevel models."

    Having long been a fan of (Tukey, Tufte and Cleveland) for ideas on using good graphs for gaining insight and/or communicating it in print, maybe you can include some commentary:

    To what extent are your choices in graphics driven by the medium and specific restrictions:

    1) Black and white print, possibly with greyscale.

    2) Color print.

    3) Color display, with various degrees of interactivity, from simple rotation/zoom, to cut planes to slider controls, or time-varying graphical representations.

    For example, given Z = f(X, Y), one might display a 3D-surface, and then use a cut-plane parallel to X (Y)axis to hold Y (X) constant, etc.

    Perhaps there are equivalents for statistical displays of 3D volume visualizations, density displays, etc akin to those used for medical visualization or fluid flows trying to show multivariate data. Sometimes even simple tools can be useful: recall the Jurassic Park movie where kid sits at computer, says "UNIX, I know this", and flies around in a simple visual representation. A related version of that tool showed the per capita income as a bar per state of USA for last 100 years, with a user-controlled slider to select the year. As one slid the control, the bars would jiggle up and down and human visual systems would quickly detect rapid changes in one state in the context of nearby states. For example, there were some multi-year periods when Kansas or South Dakota jumped, stayed there, and then came back down to level of nearby states. Those turned out to be when they were building Minuteman missile silos.

    Even in print form, I observe the display makes a big difference. For instance, Figure 6.10(b) here is a "spaghetti graph" of various reconstructions of temperature. Of course, the original graphs typically have error bars, but when put together, they get lost, and it is very hard to tell the extent of agreement/disagreement, which has confused many people. In that version, they added Fig 6.10(c), which in effect shows a density plot to try to explain the overlap … which I think is better, but not a typical chart.

    Anyway, the general hope is for discussion about assumptions on the graphics media being targeted and what you might do with different media.

Comments are closed.