Skip to content
 

MRP (multilevel regression and poststratification; Mister P): Clearing up misunderstandings about

Someone pointed me to this thread where I noticed some issues I’d like to clear up:

David Shor: “MRP itself is like, a 2009-era methodology.”

Nope. The first paper on MRP was from 1997. And, even then, the component pieces were not new: we were just basically combining two existing ideas from survey sampling: regression estimation and small-area estimation. It would be more accurate to call MRP a methodology from the 1990s, or even the 1970s.

Will Cubbison: “that MRP isn’t a magic fix for poor sampling seems rather obvious to me?”

Yep. We need to work on both fronts: better data collection and better post-sampling adjustment. In practice, neither alone will be enough.

David Shor: 2012 seems like a perfect example of how focusing on correcting non-response bias and collecting as much data as you can is going to do better than messing around with MRP.

There’s a misconception here. “Correcting non-response bias” is not an alternative to MRP; rather, MRP is a method for correcting non-response bias. The whole point of the “multilevel” (more generally, “regularization”) in MRP is that it allows us to adjust for more factors that could drive nonresponse bias. And of course we used MRP in our paper where we showed the importance of adjusting for non-response bias in 2012.

And “collecting as much data as you can” is something you’ll want to do no matter what. Yair used MRP with tons of data to understand the 2018 election. MRP (or, more generally, RRP) is a great way to correct for non-response bias using as much data as you can.

Also, I’m not quite clear what was meant by “messing around” with MRP. MRP is a statistical method. We use it, we don’t “mess around” with it, any more than we “mess around” with any other statistical method. Any method for correcting non-response bias is going to require some “messing around.”

In short, MRP is a method for adjusting for nonresponse bias and data sparsity to get better survey estimates. There are other ways of getting to basically the same answer. It’s important to adjust for as many factors as possible and, if you’re going for small-area estimation with sparse data, that you use good group-level predictors.

MRP is a 1970s-era method that still works. That’s fine. Least squares regression is a 1790s-era method, and it still works too! In both cases, we continue to do research to improve and better understand what we’re doing.

4 Comments

  1. Cyrus says:

    Or more like the 1960s. See Pool, Abelson and Popkin (1965)’ book on the 1960 presidential election.

  2. Cyrus:
    Pool et al. did poststratification, but did they do multilevel regression? Supposedly Tukey et al. did multilevel modeling for election modeling in 1960, but this doesnʼt seem so relevant as I donʼt think they wrote any of it up.

  3. David Shor says:

    The actual issues with producing accurate state-wide estimates in states like West Virginia and Ohio (a problem that did not go away in 2016 https://www.nytimes.com/2018/11/21/upshot/polls-2018-midterms-accuracy.html) stem from the fact that survey respondents are not exchangeable with non-survey respondents conditional on any of the covariates that people typically use in MRP workflows and have reliable small-area ground-truth for. The issue here is bias, not variance.

    The path to solving these problems involve figuring out how to collect better covariates to adjust for non-response bias than currently exist and are traditionally used in the survey industry, and more importantly, figuring out how to generate ground-truth estimates for those covariates for small areas so that any kind of post-stratification is possible. Moreover on the regression side, as non-response bias becomes a bigger problem (because phone response rates are plummeting and we’re moving to opt-in samples), the number of covariates and model sophistication can grow to a point where the traditional approach of linear models with deep interactions no longer is computationally feasible or statistically wise.

    Obviously the status-quo of people doing polls and then using raking to generate weights is crazy, and it’d be an improvement over the status quo for them to move to MRP. But that transition isn’t actually going to fix the current structural issues with political polling that exist right now.

    • Andrew says:

      David:

      Regarding your first and third paragraphs: I have not asked people about West Virginia and Ohio, but my impression was that many of the problems in the midwestern state polls in 2016 arose from not adjusting for education of respondents, and that MRP on national polls using many adjustment factors did well; see here.

      Regarding your second paragraph: This is a current area of research, to use modeling to better estimate the population distribution of covariates, and to fit better models of the outcome given the covariates. I don’t see why you say that “the traditional approach of linear models with deep interactions no longer is computationally feasible or statistically wise.” This traditional approach seemed to work well for Yair in analyzing the 2018 election.

Leave a Reply