Doing Mister P with multiple outcomes

Someone sends in a question:

I’ve been delving into your papers and blog posts regarding MRP. The resources are really great – especially the fully worked example you did in collaboration with Juan Lopez-Martin and Justin Phillips.

The approach is really nice for when you want to estimate just a single parameter of interest from a survey, such as support for a policy. However, I’m wondering whether you’ve also had to deal with situations where you want to compare support for one policy vs. another, that were asked in the same survey of the same respondents? Some media reporting that compares levels of support for different things seems like they are often just looking at numbers from separate models run on each question, but the data might have been collected from the same people in a single survey. Having run some MRP, I can see that if you wanted to also add in repeated measures of respondents then the computational complexity can really balloon (in fact, I recently melted part of my computer trying to do this with an ordinal model!). But I also assume it is not totally valid to take data from the same respondents, run one MRP model on ‘Support for X’, and another model on ‘Support for Y’, or different framings of the same issue, and compare the posterior distributions of the level of support from these separate models – because that would be treating them as independent responses. Life would be much easier if this were an acceptable approach but I am not sure that it is!

Is my sense above incorrect, or should one instead incorporate the repeated measurement into the regression equation? I thought it might be possible to do so by changing the type of equation you have in your MRP primer (fit) to something like fit2: I add ‘question’ as an effect that, like the intercept, can vary across demographic subgroups, and I add respondentID as another way in which the data is nested. Is this anything like how you would deal with this?

fit <- stan_glmer(abortion ~ (1 | state) + (1 | eth) + (1 | educ) + male + (1 | male:eth) + (1 | educ:age) + (1 | educ:eth) + repvote + factor(region), ...) fit2 <- stan_glmer(response ~ question + (1 + question | state) + (1 + question | eth) + (1 + question | educ) + question:male + (1 + question | male:eth) + (1 + question | educ:age) + (1 + question | educ:eth) + question*repvote + question*factor(region) + (1 | respondentID), ...) This seems like something that ought be considered in opinion polling/survey research, and it is interesting to think about how best to address it. The main things I was wondering about were the levels at which different things should be nested, and at a practical level, if I generate expected responses with a function such as 'posterior_epred', can I just put new respondent IDs in the poststrat table and assume that randomly drawing different 'participants' across the different posterior draws will balance out the different possible respondent intercepts that might be drawn? As a note, I also see that sometimes one could instead do MRP on difference or change scores, but this is not really possible with certain response formats or with many different items from the same subject.

My reply:

If you have multiple, related questions on the same survey, then one approach is to fit an ideal-point model. I thought we had an ideal-point model in that case study, but now I don’t see it there—I wonder where it went? But the ideal-point model really only makes sense when the different questions are measuring the same sort of thing; if you’re interested in responses in two different issues, that’s another story. I agree that modeling the two responses completely separately is potentially wasteful of information.

Your proposed fit_2 model is similar to an ideal-point model, but it has the problem that the responses to the two questions are assumed to be independent. You could think of it like two separate models where there’s some pooling of coefficients between the two models.

What to actually do? I’m not sure. I think I’d start by just fitting two separate models, then you could look at the correlation of the residuals, and if there’s nothing there, maybe it’s ok to go with the separate models. If the residuals do show some correlation, then more needs to be done. Another approach is that if the two questions have a logical order, then you can first model y1 given x, and then model y2 given y1 and x.

P.S. I asked my correspondent if I could post the above question, and he replied, “Yes, but please don’t put any reference to my institution in there (not like it is secret, but just would have to clear it with people if it is included!).” So his identity will remain secret.

4 thoughts on “Doing Mister P with multiple outcomes

  1. Maybe I don’t fully understand the problem, and I unfortunately don’t know what an ideal-point model is, but could you use a multilevel multivariate model, sensu brms:

    brm(mvbind(Y1, Y2,…,Yp) ~ (1 | survey variables) + set_rescor(rescor = TRUE)),

    where Y1, Y2,…,Yp contain the observed responses to each question of interest, p? If residuals at observation level can’t be directly modeled with a covariance matrix (e.g., bernoulli response?), you could still let the varying effects above the observation-level be correlated. Something like:
    mvbind(Y1, Y2,…,Yp) ~ (1 | a | state) + (1 | b | eth) + (1 | c | educ) +,.., etc,

    where ‘a’, ‘b’, ‘c’, etc indicate correlated latent intercepts between the common units across responses?

  2. I’ve also found this to be a practical disadvantage of MrP. Even with a single scalar outcome Y, then one may want to compute changes in the CDF. This is straightforward with IPW or post-stratification alone.

    • Dean:

      We need MRP (or, more generally, RPP). “Post-stratification alone” doesn’t do the job because it requires that we either adjust for very few variables or else have a very noisy adjustment.

    • Question for Andrew: I had understood your 2020 paper (https://arxiv.org/pdf/1707.08220.pdf) to have a procedure for computing weights that are equivalent to a given MRP estimate. Is there a possibility (especially for Dean’s example where there’s a primary outcome) where weights derived from MRP on one outcome are used for the other outcomes?

Leave a Reply

Your email address will not be published. Required fields are marked *