## MRP (or RPP) with non-census variables

It seems to be Mister P week here on the blog . . .

A question came in, someone was doing MRP on a political survey and wanted to adjust for political ideology, which is a variable that they can’t get poststratification data for.

Here’s what I recommended:

If a survey selects on a non-census variable such as political ideology, or if you simply wish to adjust for it because of potential nonresponse bias, my recommendation is to do MRP on all these variables.

It goes like this: suppose y is your outcome of interest, X are the census variables, and z is the additional variable, in this example it is ideology. The idea is to do MRP by fitting a multilevel regression model on y given (X, z), then poststratify based on the distribution of (X, z) in the population. The challenge is that you don’t have (X, z) in the population; you only have X. So what you do to create the poststratification distribution of (X, z) is: first, take the poststratification distribution of X (known from the census); second, estimate the population distribution of z given X (most simply by fitting a multilevel regression of z given X from your survey data, but you can also use auxiliary information if available).

Yu-Sung and I did this a few years ago in our analysis of public opinion for school vouchers, where one of our key poststratification variables was religion, which we really needed to include for our analysis but which is not on the census. To poststratify, we first modeled religion given demographics—we had several religious categories, and I think we fit a series of logistic regressions. We used these estimated conditional distributions to fill out the poststrat table and then went from there. We never wrote this up as a general method, though.

1. Eric says:

Would it be reasonable to fit both multilevel regression models (y given (X, z) and z given X) at the same time in a single Stan run? Would there be any benefits?

2. Jeff Lax says:

Here is a paper doing it for party:
http://www.columbia.edu/~jrl2124/klp2_paper.pdf

Polarizing the Electoral Connection: Partisan Representation in Supreme Court Confirmation Politics
Author(s): Jonathan P. Kastellec, Jeffrey R. Lax, Michael Malecki, and Justin H. Phillips Source: The Journal of Politics, Vol. 77, No. 3 (July 2015)

And another:
http://www.columbia.edu/~jrl2124/partypurse.pdf

The first propagates the uncertainty in the first stage throughout; the second also propagates the uncertainty but not in the linked version.

3. Mike H says:

Wrt religion, why model it with census data? Why not use the GSS instead? It includes questions about religion.

• Andrew says:

Mike:

Using GSS for religion is fine but then you’ll need to do some modeling if you want to use it for MRP, because (a) sample size is way too small to take raw numbers and treat them as population proportions, (b) GSS doesn’t give you the state where the respondent lives, and (c) even if GSS did give state, the data for each state wouldn’t be representative as GSS uses cluster sampling.

4. Devin says:

Two papers that address closely related problems are:

Leeman, Lucas, and Fabio Wasserfallen. 2017. “Extending the Use and Prediction Precision of Subnational Public Opinion Estimation.” American Journal of Political Science 61 (4): 1003–1022.

and

Caughey, Devin and Mallory Wang. Forthcoming. “Dynamic Ecological Inference for Time-Varying Population Distributions Based on Sparse, Irregular, and Noisy Marginal Data.” Political Analysis. https://www.dropbox.com/s/hdj1owa73x4jwx8/CaugheyWang-PopEst.pdf?dl=0