Skip to content

Wanna know what happened in 2016? We got a ton of graphs for you.

The paper’s called Voting patterns in 2016: Exploration using multilevel regression and poststratification (MRP) on pre-election polls, it’s by Rob Trangucci, Imad Ali, Doug Rivers, and myself, and here’s the abstract:

We analyzed 2012 and 2016 YouGov pre-election polls in order to understand how different population groups voted in the 2012 and 2016 elections. We broke the data down by demographics and state and found:
• The gender gap was an increasing function of age in 2016.
• In 2016 most states exhibited a U-shaped gender gap curve with respect to education indicating a larger gender gap at lower and higher levels of education.
• Older white voters with less education more strongly supported Donald Trump versus younger white voters with more education.
• Women more strongly supported Hillary Clinton than men, with young and more educated women most strongly supporting Hillary Clinton.
• Older men with less education more strongly supported Donald Trump.
• Black voters overwhelmingly supported Hillary Clinton.
• The gap between college-educated voters and non-college-educated voters was about 10 percentage points in favor of Hillary Clinton
We display our findings with a series of graphs and maps. The R code associated with this project is available at

There’s a lot here. I mean, a lot. 44 displays, from A:

to Z:

And all sorts of things in between:



  1. Alan says:

    No discussion of turnout? I’m not sure this research has any applicable value if it doesn’t examine the possible effects of turnout.

  2. Carlos Ungil says:

    > The code used to run the multilevel regression and poststratification from start to finish can be found here. All the data necessary to run the code is on the repo.

    Maybe I’m doing something wrong?

    # git clone
    # cd mrp_2016_election/mrp
    # R -f R_code/2016_mrp.R
    cannot open compressed file ‘data/poll_2016_modeling_frame.RDS’, probable reason ‘No such file or directory’

    The missing file seems to be generated by R_code/polling_2016.R
    but that script needs to load(“poll.Rdata”) which is also missing

  3. a reader says:

    Wow, I think this is great!

    To be completely honest, I was skeptical of what I think is your preferred mode of data analysis; fit a lot of models and present them all. Basically, I had the worry it would be information overload and the reader would walk away more confused than when they started. But this report has totally made me a convert. In too many papers, the analysis results are so abbreviated that it’s hard to even know what covariates were included in the results presented, etc. Here, we may dive as deep as we want into the analysis of the data (even to the point of Carlos trying to run the full analysis again)…or just skim the highlights to see if we are interested.

    May this be the new standard! Maybe it is and I’ve just been reading papers in the wrong fields?

  4. Julien says:

    It might be my european sensibility, and I absolutely don’t want to launch a debate on racism and so on, but it is very surprising to me that you are using “race” as a determinent. Not on moral grounds (or just a bit) but on statistical grounds. You always talk about measurement, and I think self-declared race is a really poor measure of what really is going on. My guess is that there are a lot of confounding factors that really influence voting patterns and are packed together under the variable “race”. Wouldn’t it be better (and more interesting) to actually try to measure and include these confounding factors? Do you have any thoughts on this?

    • Andrew says:


      Also I think we might rather measure wealth than income. We use the survey questions that everyone else uses. I guess it’s easier to just ask people about race or ethnic background than to ask about a bunch of other questions that might not be so stable over time. But you may well be right that we could do better.

  5. Tobias says:

    I assume it’s not possible for you to release the YouGov polling data. But would it be possible for you to publish your model outputs, i.e. the year x state x demographic-level estimates for likelihood to vote and for likelihood to vote for the Republican candidate?

  6. Jonathan (another one) says:

    44 Displays from A to Z. Seems like that violates the pigeonhole principle, but maybe you have some extra characters from other languages. I hate it when posts don’t give me enough information to reproduce the logic.

Leave a Reply