Dispelling confusion about MRP (multilevel regression and poststratification) for survey analysis

A colleague pointed me to this post from political analyst Nate Silver:

At the risk of restarting the MRP [multilevel regression and poststratification] wars: For the last 3 models I’ve designed (midterms, primaries, now revisiting stuff for the general) trying to impute how a state will vote based on its demographics & polls of voters in other states is only a mediocrely accurate method.

It’s a decent stand-in when you have few polls and weak priors. Our models do use it a little bit of it. But generally speaking, looking directly at the polls in a state is quite a bit more accurate. And often, so are simpler “fundamentals” methods (e.g. national polls + PVI).

Part of the issue is that seemingly demographically similar voters in different states may not actually vote all that similarly. e.g. a 46-year-old Hispanic man in California probably has different views on average than a 46-year-old Hispanic man in Idaho or Florida…

That’s partly because there are likely some unobserved characteristics (maybe the voter in Florida is Cuban-American and the one in California is Mexican-American) but also because states have different political cultures and are subject to different levels of campaign activity.

MRP does partial pooling. If you are interested in estimating public opinion in the 50 states, you’ll want to use all available information, including state polls and also national polls. To do this you’ll need a model of time trends in national and state opinions. That’s what Elliott, Merlin, and I do here. We don’t use MRP (except to the extent that the individual polls use MRP to get their topline estimates), but if we had the raw data from all the polls, and we had the time to devote to the project, we would do MRP on the raw data.

Because MRP does partial pooling of different sources of information, it should not do any worse than any one piece of this information.

So when Nate writes that “looking directly at the polls in a state is quite a bit more accurate” than MRP, I’m pretty sure that he’s just doing a crappy version of MRP.

A crappy version of any method can do really bad, I’ll grant you that.

The most useful step at this point would be for Nate to share his MRP code and then maybe someone could take a look and see what he’s doing wrong. Statistics is hard. I’ve made lots of mistakes myself (for example, here’s the story an embarrassing example in polling analysis from a few years back).

There indeed are examples where MRP won’t help much, if the separate estimates from each state have low bias and low uncertainty. But in that case MRP will just do very little partial pooling; it should not perform worse than the separate estimates.

Similarly for the fundamentals-based models that Nate mentions. Our forecast partially pools toward the fundamentals-based models. More generally, if the fundamentals-based models predict well, that should just help MRP. But you do have to include that information in the model: MRP, like any statistical method, is only as good as the information that goes into it.

Cooperation, not competition

I feel like the problem is that that Nate is seeing different methods as competing rather than cooperating. From his perspective, it’s MRP or state polls or fundamentals. But it doesn’t have to be one or the other. We can use all the information!

To put it another way: choosing a statistical method or a data source is not like an election where you have to pick the Democrat or the Republican. You can combine information and partially pool. The details of how to do this can get complicated, and it’s easy enough to get it wrong—that’s been my experience. But if you’re careful, you can put the information together and get more out of it than you’d learn from any single source. That’s why people go to the trouble of doing MRP in the first place.

I’m not saying that Nate should go use MRP

There are many roads to Rome. What’s important about a statistical method is not what it does with the data, so much as what data it uses. Nate’s done lots of good work over the years, and if he can manage to use all the relevant information in a reasonable way, then he can get good answers, whether or not he’s using a formal Bayesian adjustment procedure such as MRP.

Indeed, I’d rather that Nate use the method he’s comfortable with and do it well, than use some crappy version of MRP that gives bad answers. I do think there are advantages of using MRP—at some point, doing multiple adjustments by hand is like juggling plates, and you start to have issues with impedance matching—but it also can be hard to start from scratch. So I can accept Nate’s argument that MRP, as he has tried to implement it, has problems. The point of this post is just to clear up misunderstandings that might arise. If you do MRP right, you should do better than any of the individual data sources.

People are different. Your model should account for that.

Nate correctly writes that, “a 46-year-old Hispanic man in California probably has different views on average than a 46-year-old Hispanic man in Idaho or Florida.” Yup. Part of this is that the average political positions in California, Idaho, and Florida differ. In a basic MRP model with no interactions, the assumption would not be that middle-aged Hispanics vote the same on average in each state. Rather, the assumption would be that the average difference (on the logistic scale) between Hispanics and non-Hispanic whites would be the same in each state, after adjusting for other demographics such as age, education, and sex that might be included in the model. That said, in real life the average difference in attitudes, comparing Hispanics to non-Hispanic whites with the same demographics in the same state, will vary by state, and if you think this is important you can (indeed should) include it in your MRP model.

The relevant point here is that you should be able to directly feed your substantive understanding of voters into the construction of your MRP model. And that’s how it should be.

P.S. Zad sends in the above photo showing us what’s really inside the black box.

14 thoughts on “Dispelling confusion about MRP (multilevel regression and poststratification) for survey analysis

  1. I read what Nate wrote differently. I don’t think he was writing that MRP as a technique will do worse given all the data. I imagine what he was trying to say is that state polls alone provide more information than all the polls from other states aggregated. I don’t know how he did this but the analysis I imagined when reading his tweets was: Compare an in-state polls only model with an MRP using every state but the state in question. Repeat for all 50 states. Plot accuracy for each method vs. quantity of polling. If one sees that in all but the least polled states that polls only outperforms, one might write something like Nate did.

  2. [One month later] I still wonder how should the reader interpret the different elements in the “modeled popular vote on each day” chart. From the explanatory article it seems that these “95% confidence intervals” should be interpreted probabilistically, but the probability of what precisely? It’s not very clear, in particular for the part of the chart to the right of the current date.

    The article ends with: “And if our stated probabilities do wind up diverging from the results, we will welcome the opportunity to learn from our mistakes and do better next time.” What does it mean for the results and your stated probabilities to diverge?

    [The link to the forecast brought this question it to my mind again, but I understand it is not as relevant to this post as it was to the June 19 post.]

    • Carlos:

      The uncertainty ranges are 95% posterior probabilities for support for each of the two candidates at each day. It’s all conditional on the model. Regarding, “what does it mean for the results and your stated probabilities to diverge,” see Figure 2 of this paper.

      • Thanks. Regarding the second point, I think my issue was with the formulation “if our stated probabilities do wind up diverging from the results” making a reference to “stated probabilities” rather than “predictions” or “forecasts”. It was not clear to me that “stated probabilities” meant “forecast of vote share” and I thought it was related to the probability of some outcome.

        A few comments about the 95% confidence intervals:

        The central forecast and the upper and lower bounds between today and election day wiggle up and down. I think it would be better to smooth those lines, otherwise you’re committing the graphical version of going against the “thou shalt not write numbers with non-sensical precision” commandment.

        It seems that the width also varies, apparently due in part to the meaningless noise introduced by the simulation, but overall it is
        surprisingly stable. The width seems to increase slightly first and decrease later, but it is not clear.

        If the model “pulls” the forecast to some “fundamental” forecast one could expect that the probability mass above 57% for Biden would be pulled down while the probability mass below 53% would be pulled up (assuming the “target” is within that range, as it seems from the chart).

        If paths are thus attracted to some “forecast based on non-polling data” the probability would concentrate. On the other hand, uncertainty should increase as we go further. Maybe both “narrowing” and “widening” are present in the model and they offset each other?

        If, according to the model, say that

        a) there is 95% probability that support for Biden tomorrow is in the 51.8%-57.8% range

        b) there is 95% probability that support for Biden on October 18 is in the 51.4%-57.4% range

        Do you believe those probabilities to the same extent? Do you find that the model is equally reasonable at the one-day and the three-months horizon? If you had to bet on one of those statements (assuming the true support at the time could be determined to settle the bet) would you be indifferent between them?

  3. > That said, in real life the average difference in attitudes, comparing Hispanics to non-Hispanic whites with the same demographics in the same state, will vary by state,

    I would guess that comparisons among Hispanic males with *different* demographic profiles in the same state will also vary state by state, also. IOW, maybe age differences are more explanatory in one state whereas education differences are in another. If so, how do you weight different parameters (e.g., ethnicity vs.age vs.income vs. education level) in a way that allows for an appropriate variance in line with one state vs. the next state? Or do you just assume it works OK to weight those variables relative to each other in the same way across different states?

    • Joshua:

      Yes, there are many potential interactions; see this article. No model will capture everything, so in practice we think of these as average adjustments, for example adjusting for average differences between Hispanics and non-Hispanic whites of the same sex, age, and education categories. Nate’s comment was missing the point that this adjustment does not assume that the voting patterns of 46-year-old Hispanic men are the same in every state. Again, this is not to say that regression models are perfect. But, as we saw in 2016, raw data are not so perfect either.

  4. Either Silver’s wrong or he’s found an empirical loophole to a theoretical proof. He should write it up (perhaps using simulated data as to safeguard his proprietary work) and blow everyone away, like any good statistician. Unless he was just being one of those unreliable pundits he wrote a book about.

    • Michael:

      I think what happened is that Nate has tried some partial pooling and it hasn’t worked so well for him. That’s why I invited him to share his code, as then maybe people could point out how he could do better.

      Statistical methods can be tricky, and it’s easy to mess up if you try to implement them from scratch. Just for example, I’ve read about deep learning and how cool it is.

      But if I tried to fit a deep learning model on my own, I’m sure there are all sorts of ways I could go wrong, all sorts of tacit knowledge that I’d not be including that would lead to me getting bad results. The problem wouldn’t be with deep learning, it would be with my implementation of deep learning.

      Now, at this point, you could criticize MRP, or deep learning, or some other cutting-edge method, as not being fully mature, in the sense that you can’t just push a button, you have to be careful in setting up your problem or you can get bad results. And that would be a valid criticism. But that’s just the way it is with new methods. It takes awhile to set them up so that they are easy for outsiders to use.

      To return to a point that has come up over and over again on this blog: none of us is an island when it comes to statistical methodology, and any of us can be made stronger by sharing our methods and code and opening ourselves up to criticism. By hiding behind his twitter wall, Nate’s not making use of that opportunity. If he were to share his code, he could do better. It’s too bad, really.

  5. I agree with the points you’ve made but this post does feel a bit rushed/sloppy. I wish you’d have taken a bit more time to write as it really gets at the heart of something that Nate has started to do. As you say, he isn’t even talking about MRP in his critique if you look at it even a little closely. He’s really just saying something that could be said about any technique or any badly specified model. So why is he even taking that tact at all? It’s borderline misinformation and I think it’s worth asking why he’s putting it out.

    • Anon:

      See my comment to Michael here. I have the impression that Nate tried some partial pooling himself and it didn’t work well, and he’s attributing the failure to the method rather than his implementation of it. I wrote this post because Nate has a lot of readers, and I’m concerned that they’d read his post and think there’s something wrong with MRP.

      But, yeah, MRP, like any sophisticated method, can screw up, and more research is needed on understanding these sorts of problems.

  6. I really like Nate Silver and his work but I’ve had a central problem with him for over a decade now: his work is all proprietary,.

    Being in a commerical space (rather than an academic space), he rather HAS to keep his models secret. If others can just run his models, he loses traffic to his site. His primary goals is NOT to advanced our knowledge of good political modelling, and so traffic to his site takes a much higher priority.

    But that leaves us not really knowing what he is doing. To my knowledge, there is no context in which he opens up to the deatils of his models to criticism by outsiders, not even small groups under some sort of NDAs (formal or otherwise).

    This makes it harder to learn from his work, harder for him to learrn from others, and leaves him more ignorant than he might be otherwise. That ignorance appears to be on display here, and it matters because of the prominence of his platform.

Leave a Reply

Your email address will not be published. Required fields are marked *