Multilevel Regression and Poststratification Case Studies

Juan Lopez-Martin, Justin Phillips, and I write:

The following case studies intend to introduce users to Multilevel Modeling and Poststratification (MRP) and some of its extensions, providing reusable code and clear explanations. The first chapter presents MRP, a statistical technique that allows to estimate subnational estimates from national surveys while adjusting for nonrepresentativeness. The second chapter extends MRP to overcome the limitation of only using variables included in the census. The last chapter develops a new approach that combines MRP with an ideal point model, allowing to obtain subnational estimates of latent attitudes based on multiple survey questions and improving the subnational estimates for an individual survey item based on other related items.

These case studies do not display some non-essential code, such as the ones used to generate figures and tables. However, all the code and data is available on the corresponding GitHub repo.

The tutorials assume certain familiarity with R and Bayesian Statistics. A good reference to the required background is Gelman, Hill, and Vehtari (2020). Additionally, multilevel models are covered in Gelman and Hill (2006) (Part 2A) or McElreath (2020) (Chapters 12 and 13).

The case studies are still under development. Please send any feedback to [email protected].

This is the document I point people to when they ask how to do Mister P. Here are the sections:

Chapter 1: Introduction to Mister P

1.1 Data
1.2 First stage: Estimating the Individual-Response Model
1.3 Second Stage: Poststratification
1.4 Adjusting for Nonrepresentative Surveys
1.5 Practical Considerations
1.6 Appendix: Downloading and Processing Data

Chapter 2: MRP with Noncensus Variables
2.1 Model-based Extension of the Poststratification Table
2.2 Adjusting for Nonresponse Bias
2.3 Obtaining Estimates for Non-census Variable Subgroups

Chapter 3: Ideal Point MRP
3.1 Introduction and Literature
3.2 A Two-Parameter IRT Model with Latent Multilevel Regression
3.3 The Abortion Opposition Index for US States
3.4 Estimating Support for Individual Questions
3.5 Concluding Remarks
3.6 Appendix: Stan Code

This should be useful to a lot of people.

5 thoughts on “Multilevel Regression and Poststratification Case Studies

  1. Hey Andrew. Long time no comment.

    This looks great and posting just to say I coincidentally put some brms code up online yesterday (for a forthcoming NYT article) that adds modeled joint distributions of 2020 vote choice to the census post-strat frame, allowing researchers to include it in the modeling stage for attitudinal variables (as per the limits in Sec. 1.5 and 2+). This may be preferable to using party because we can correct the distributions post-hoc (a la you and Ghitza 2013) to quantities we can measure in the real world, but it leaves the possibility for substantial bias in the non-voter category or within the partisan buckets.

    I am obviously not the first one to do this, but thought it might be helpful to see it made public. I have also been playing around with contextual variables for region and demographics, which I find (anecdotally) help structure those random slopes and intercepts a bit better. It is also possible to adjust the outcomes for different demographics and geographies simultaneously, but I have not done this yet. See the functions in {ccesMRPrun}. (I have also considered raking the cell predictions and n’s, but as I understand that could just screw up the joint distributions again.) My code also does not propagate the uncertainty from the first-stage models, which you’d want to do.

    The repo is here if you’re curious: https://github.com/elliottmorris/nyt-mrp-policy-support
    And the direct link to the acs frame with modeled vote choice is here, in case you want to explore it: https://github.com/elliottmorris/nyt-mrp-policy-support/blob/main/data/mrp/acs_psframe_with_2020vote.csv

    Elliott

  2. I’m curious if anyone has used MRP on discrete choice/categorical data. I do a lot of choice modeling, including with Stan (currently struggling to get reasonable results on a model). It seems a good application to me because we often have online survey responses and want to weight likelihood contributions based on raked/IPF weights across demographics. The MRP example is a logistic regression, so it should be a straight-forward extension.

  3. In section 1.5 you write, “As expected, our remarkably nonrepresentative sample produces estimates that are lower than what we obtained by using a random sample in the previous section.” “Lower” should be “higher”.

    In any case, awesome case study, thanks for writing it up!

Leave a Reply

Your email address will not be published. Required fields are marked *