Comparing election outcomes to our forecast and to the previous election

by Andrew Gelman and Elliott Morris

Now that we have almost all the votes from almost all the states, we can step back and answer two questions:

1. How far off were our predictions?

2. How did Joe Biden’s performance compare to Hillary Clinton’s four years earlier?

How far off were our predictions?

Here’s what we have so far:

The segments show 95% predictive intervals for each state. Coverage isn’t too bad, but that’s a matter of luck: had Biden done a little bit better, his results would’ve been inside all 50 of the intervals; had he done a bit worse, many of the state intervals would’ve missed the mark. These 50 outcomes are highly correlated, so you can’t expect to measure calibration as if they were 50 independent forecasts.

And here are the differences between outcome and predictions:

The above vote counts are provisional (for example, the Pennsylvania number will be changing), and we’ll be getting data from California, New York, Massachusetts, and a few other states; but you get the idea.

The prediction was off in Florida, but Florida was not the worst-predicted state. Not even the worst-predicted swing state.

The average prediction was off by about 2.5 percentage points, which can happen in the real world of polling, even though each of these estimates is based on the average of many polls. The polls are not supposed to be off by that much, which is why we said that the polls messed up, and our forecast failed (in some part) by not correcting for these errors. (Yes, we accounted for them in the minimal way by including a nonsampling error term that that the effect of spreading out our intervals, but you’d really like to correct for the bias, not just account for its possibility.)

As we’ve discussed elsewhere, we can’t be sure why the polls were off by so much, but our guess is a mix of differential nonresponse (Republicans being less likely than Democrats to answer, even after adjusting for demographics and previous vote) and differential turnout arising from on-the-ground voter registration and mobilization by Republicans (not matched by Democrats because of the coronavirus) and maybe Republicans being more motivated to go vote on election day in response to reports of 100 million early votes.

How did Joe Biden’s performance compare to Hillary Clinton’s four years earlier?

Here’s what happened:

Setting aside Illinois and Hawaii, we could’ve forecast the election pretty much as well by taking Clinton’s vote and adding 2 percentage points for every state, as by doing all our poll analysis. This is not to say that the polls are worthless, just that there’s a lot of information already available from the past elections, along with fundamentals-based predictions that tell us roughly where to expect the national election to go. Polls provide additional information to be included in the Bayesian average; also we should do a better job of trying to anticipate polling errors.

We also made this graph comparing polling errors in 2016 and 2020:

>

Some of the polling errors were happening in the same places.

In summary

In 2016, Clinton got 51% of the two-party vote. This time, it looks like Biden will end up with 52% or 53%. From the perspective of the long history of U.S. presidential races, this would be considered a close election. From the perspective of recent polarized America, it’s strong victory, only the second time this century that a presidential candidate has received more than 52% of the two-party vote. Based on news reports so far, we’re guessing that the votes for Congress were closer to 50/50 with only a very narrow edge for the Democrats.

96 thoughts on “Comparing election outcomes to our forecast and to the previous election

  1. Great diagnostics.

    Very interesting how even the swing from 2016 was. In the UK election analysis is based on forecasting each individual constituency in order to determine the majority party in Parliament. Most of the analysis is based on a uniform swing — adding the current difference in national polls to the party share in each district. They then talk about local effects that may adjust from here.

    This isn’t done much in the US, where we have a fewer number of units (states) and polls in swing states. However it looks like it might be almost as good, with much fewer parameters, than a state poll model.

    This effect may be a consequence of the increased polarization. Partisanship of each location is mostly built in and fixed, with only small changes at the margin.

    I’d also say that the prediction models performed better than the polls. The models said that even with a typical type of miss on the polls, Biden would still win. It looks like there was a ~2 s.e miss on the polls, and Biden will win by a small electoral margin.

  2. “…we could’ve forecast the election pretty much as well by taking Clinton’s vote and adding 2 percentage points for every state, as by doing all our poll analysis.”

    Can those 2 points be gleaned from analysis of the polling in 2016 and 2020. That is, is there a way to forecast (perhaps also state-by-state) by taking the difference between the polls and then just adding (or substracting) that from the previous election’s actual result?

  3. As we’ve discussed elsewhere, we can’t be sure why the polls were off by so much, but our guess is a mix of differential nonresponse (Republicans being less likely than Democrats to answer, even after adjusting for demographics and previous vote) and differential turnout arising from on-the-ground voter registration and mobilization by Republicans (not matched by Democrats because of the coronavirus) and maybe Republicans being more motivated to go vote on election day in response to reports of 100 million early votes.

    You left off the possibility of bald-faced lying by pollsters and the media.

    After years of lying about the Russian Collusion hoax, the Michael Brown hoax, etc., bald-faced lying should be one of the top hypotheses.

    • Terry:

      That seems ridiculous for me, most obviously because pollsters have a clear commercial motivation to be accurate, a point I discuss here. There are also obvious questions as to how it would be that dozens of pollsters and news organizations were all lying at the same time.

      • how [could it] be that dozens of pollsters and news organizations were all lying at the same time.

        Name one mainstream news organization that called Harris on her recent bald-faced racist lie that Michael Brown was “murdered”.

        How many mainstream news organizations held Biden accountable for his oft-repeated bald-faced racist lie that Trump called Nazis and white supremacists “fine people”?

        • Terry:

          1. I was specifically talking about the polls. You seem to be suggesting that dozens of polling organizations are all faking their data, and that nobody’s checking this. I think that’s ridiculous. It would represent a conspiracy involving thousands of people, up there with the conspiracy of the faked moon landing. Also something that directly damages the business interests of the people doing the dirty deeds. JFK conspiracies are nothing compared to this.

          2. It took me about 5 seconds to find a mainstream news organization that called Harris on her statement about Michael Brown. You can do it too. Just google *michael brown facts* and you’ll come to a Washington Post article on the topic. Similarly you can google *trump fine people* and right there on the first page you’ll see a USA Today article. They do quote Biden’s statements and I don’t see that he lied about it.

        • “It would represent a conspiracy involving thousands of people…”

          I love the old Ben Franklin line: “three people can keep a secret if two of them are dead”.

        • 1. You talked about “news organizations” and “pollsters”. Does the dishonesty of news organizations extend to their related polling operations? I don’t know, but is should be considered.

          2. I don’t allege they were faking their data. A thumb on the scale here, ignoring a weak assumption there can add up to serious bias.

          3. I don’t allege a conspiracy theory, which requires a meeting of the minds. I only suggest the possibility of a systematic bias. It seems undeniable that most of the media is systematically biased.

          4. I overstated when I implied that there was no acknowledgement at all in the mainstream media about the Harris lies about Michael Brown. It is clear, though that most reports were content to leave the false claim unrebutted.

          5. Biden lied.

          Trump “condemned totally” neo-Nazis and white nationalists:

          “And you had people — and I’m not talking about the neo-Nazis and the white nationalists — because they should be condemned totally.”

          Biden has lied bald-facedly about this, tweeting that:

          “Three years ago today, white supremacists descended on Charlottesville with torches in hand and hate in their hearts. Our president said they were “very fine people.””
          https://twitter.com/JoeBiden/status/1293690094554099713?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1293690094554099713%7Ctwgr%5Eshare_3&ref_url=https%3A%2F%2Fwww.westernjournal.com%2Fbbc-fact-checks-joe-bidens-charlottesville-fine-people-claim-trump-really-said%2F

          Often, Biden was only highly misleading, saying that
          “President Trump said there were “very fine people on both sides””, dishonestly implying Trump called bad people “very fine people”.

        • > 2. I don’t allege they were faking their data. A thumb on the scale here, ignoring a weak assumption there can add up to serious bias.

          > 3. I don’t allege a conspiracy theory, which requires a meeting of the minds. I only suggest the possibility of a systematic bias. It seems undeniable that most of the media is systematically biased.

          Have you lost your comment?

          https://statmodeling.stat.columbia.edu/2020/11/06/comparing-election-outcomes-to-our-forecast-and-to-the-previous-election/#comment-1575637

          > You left off the possibility of bald-faced lying by pollsters and the media.

          Bald faced lying->I don’t allege they were faking their data. You’re telling a bald faced lie about your own comment!

          As for the neo-Nazi comment, I wouldn’t exactly call that Biden lying. There was a neo-Nazi/White supremacist rally, and Trump did say some of the people at the rally were very fine people. If you actually watch the videos of the rally, there were people in crowds of hundreds chanting “gas the kikes, race war now” and other similar mantras, so those “fine people” were at best, ambivalent about protesting alongside a crowd of mostly white supremacists. The “fine people” Trump refers to do not exist. I don’t think it’s because Trump actually thinks the neo Nazis are great—most likely he was too lazy to actually research the rally or watch the videos. But nonetheless, this isn’t about Biden at all! Why are you even bringing this up?

        • 1) Individual media companies might choose to report the ones that are convenient for them, but that is not what people like Andrew are doing. If you know of polls that he should have included, you can rerun his code with those polls added in because the models are open source. See how it affects the results and let him know how his models can be improved, he seems very open to constructive criticism.

          2) “You left off the possibility of bald-faced lying by pollsters” – what were they lying about, if not their data?

          3) The pollsters obviously have some bias in their results, because they were some way off the actual result. But what specific mistakes can they fix for next time to reduce it, or how can the aggregators account for the biases?

          4) This is completely irrelevant to polls being accurate, if people believe politicians’ lies then that should be represented in the polls as a measure of voter intent.

          5) This is also irrelevant to polls being accurate.

        • My apologies for the stridency of my comments.

          Given the widespread feeling among the great and good that Trump had to go at all costs, it seems unsupportable to assign zero probability to the possibility that pollsters’ analyses were systematically biased by these feelings.

    • @Terry

      …yes, the casual presumption that all system factors in the in the official nationwide electoral infrastructure & mainstream private polling processes are honest & transparent — is highly dubious and certainly lacking in hard evidence substantiation. Opportunities for significant system error + deliberate manipulations are large.

      Polling Response-Rates are outrageously low and no amount of statistical tinkering can compensate for such shoddy sampling. Quality of the few responses harvested is also diminishing.

      The political opinion polling system is badly broken, but its purveyers and proponents have much invested maintaining its previous public “image” of integrity.

      • I think what you’re referring to at bottom is that fact that no one’s got a landline or no one answers the phone at all, the way they used to. And the garbage e-mail generally goes into the spam bucket before it’s even seen. And I trash it even if it comes from an organization I might share some sympathy with. Unless someone writes to my personally I ignore it. The robotic messages I was getting from …. were infuriating — try as I might, I’d have less chance of having a word with anyone than “K.” would have in getting any recognition from his sponsor in the Castle (He actually had a note in-hand signed by Klamm).

        So, I speculate that, in a very concrete sense, the samples that are being gotten are “non-representative” in some special way; in the same way that juries tend to be over-filled with retirees and federal employees; the pool of those who respond at all may share some latent characteristics which bias the polls in the way they appear to be.

    • I agree we should be skeptical of the media, and I agree that the largest news media organizations have been unfair to Trump, and almost certainly because of the personal feelings of the sorts of people who work there. On the other hand, the financial interests of these organizations points in the other direction, as Trump is an eminently entertaining President and generates an opportunity for a clickbait articles every time he opens his mouth, and has generalized energized the country’s interest in politics. So the financial interest of the corps serve as a countervailing force to the personal politics of the employees on election reporting. On top of that, it’s not clear which way underreporting Trump points. Maybe underreporting Trump votes gives him fewer votes because Trump voters get discouraged. Or maybe it gives Biden fewer votes because Democrats become overconfident in a landslide victory. But maybe they just want their friends to feel good about Biden’s chances, regardless of material impact on the election?

      I agree with the sentiment of being skeptical *even without direct evidence* and analyzing what people say based on their relevant interests, but I don’t think it points in a clear direction here. It kind of feels like this attack could be launched at pollsters regardless of whether they over or underestimated Trump’s chances.

      • somebody –

        > the largest news media organizations have been unfair to Trump.

        What criteria do you use to evaluate “fairness?”

        Keep in mind that Trump supporters simultaneously argue that Trump is a master manipulator of the press, and that he is a victim of the press, that they believe in “personal responsibility,” and that Trump isn’t responsible for the press coverage he receives.

        • I think Trump is a casually-racist incompetent attempted-dictator who is personally responsible for lots of suffering. I don’t think spooky leftists are running the Times by making up stories–say what you will about liberals, but I think they tend to be obsessive about procedural correctness and formal codes of ethics. That said:

          1. I think the headline-industrial complex has become pretty ridiculous. It’s not infrequent that a headline makes Trump sound more outrageous than he is. This is a consequence of the ridiculous situation where an actual journalist writes an article and some business-oriented copywriter writes the headline. Sometimes the headlines are even A/B tested to maximize clicks. Given how few many people just read headlines, this accelerates towards somewhere unpleasant

          2. Which stories are selected for serious attention from investigative journalists is subject to the personal feelings of the journalists in question. To make a specific comparison, both the Steele Dossier and the Hunter Biden laptop are based on some pretty shaky sourcing, but the Steele Dossier got an impressive looking Buzzfeed investigation and the Biden laptop got a measly NYPost article. To be clear, both articles were very transparent about where exactly the information was coming from, but the fact is that no investigative journalists were interested in making waves about the Biden laptop.

          3. This isn’t exactly “unfair” to Trump per se, but I strongly don’t want social media platforms which are natural monopolies to start fact checking with their machine learning models. First, I don’t check import sklearn or LDA or whatever with such an important task, but also these platforms both can’t be competed with and are completely unaccountable.

          And yeah, the Republican rhetoric about Trump as an invincible strong man and a victim of the big bad press is pretty funny

        • somebody –

          Thanks.

          I largely agree.

          > 1. I think the headline-industrial complex has become pretty ridiculous. It’s not infrequent that a headline makes Trump sound more outrageous than he is. This is a consequence of the ridiculous situation where an actual journalist writes an article and some business-oriented copywriter writes the headline. Sometimes the headlines are even A/B tested to maximize clicks. Given how few many people just read headlines, this accelerates towards somewhere unpleasant.

          I question the degree to which there is a clear directional change. We could go back to William Randolph Hearst times and maybe it was worse. I think it’s easy to fall into thinking there was some time where the press rose above the dirtiness of a profit model – but I don’t think it ever existed. My dad used to read I.F. Stone’s weekly, and from that perspective the press has long framed the news to advance a directional political agenda.

          I think there’s an honest conundrum here. I don’t know the answer. But the full context is important. The press is dealing with a powerful political entity that clearly sees no downside from straight-up dishonesty. In fact it’s a feature and not a bug – as even overt dishonesty advances the goal of fostering distrust than anyone knows whether anything is true.

          I”m certainly not comfortable with the extent to which media outlets have been openly editorial and jettisoned the usual long-standing “He said, He said” socio-pragmatics of neutral reporting…but I’m not sure I know what the answer is, and I also react negatively to those who are pushing a self-aggrandizing “We’re the only ones who are unbiased in our dedication to the ‘truth.'” type of framing I see from folks like Glenn Greenwald. But after the impact of the nothingburger of the Clinton email scandal last election, at what point does a press responsibly recognize an obligation to be proactive in not creating a false equivalence – for example the equating of reporting on investigations into clear indications of *potential* of Russian interaction with Trump’s campaign, and Hunter’s laptop?

          > 3. This isn’t exactly “unfair” to Trump per se, but I strongly don’t want social media platforms which are natural monopolies to start fact checking with their machine learning models.

          So that’s the framing that I react (maybe over-reacting?) to. I agree with you that these are important issues to suss out, but I just get tired of hearing about how much of a victim Trump is.

        • Agree with a lot of what you said.

          News media have been hugely dishonest many times in the past. Duranty’s denial of the Ukrainian famine pops to mind. Anti-Lincoln press. The press in the French Revolution.

          Which supports the notion that we should consider the possibility that bias has infected polling. I don’t know that it has or hasn’t. But it should be considered.

          Another way it could have infected the pollsters is by their turning a blind eye to the uncertainties. For a lot of reasons, prediction was was particularly noisy this time. Perhaps honest polling would have simply upped the uncertainty in their estimates. But then they would be admitting they really were t very useful.

        • >>by their turning a blind eye to the uncertainties

          This could also manifest by assuming too much about homogeneity within defined “groups” of voters, ie “Hispanic voters” or “white non-college voters” or whatever.

          Not bias in the ideological sense, but in the “unexamined assumptions” sense.

        • I think the news media improved in quality generally around the 40s and 50s around the time of the Hutchins commission and the establishment of the SPJ Code of Ethics. I don’t know that there’s any kind of strong enforcement in practice, but the sort of professional liberals who become journalists tend to be pretty prideful people.

          I think there’s reason to believe in a major shift now because the death of print news subscription and its replacement by a click-based attention economy has created a different economic incentives for the media as well as a different organizational structure, and the old institutions aren’t designed to handle them.

        • > 2. Which stories are selected for serious attention from investigative journalists is subject to the personal feelings of the journalists in question. To make a specific comparison, both the Steele Dossier and the Hunter Biden laptop are based on some pretty shaky sourcing, but the Steele Dossier got an impressive looking Buzzfeed investigation and the Biden laptop got a measly NYPost article. To be clear, both articles were very transparent about where exactly the information was coming from, but the fact is that no investigative journalists were interested in making waves about the Biden laptop.

          The Steele dossier came out at a pretty empty time in the news cycle while the Biden Laptop thing was right before the election as an attempted October Surprise. It’s absolutely right to apply much more caution in the latter case.

        • I would argue, if anything, that if you look at the degree to which public opinion on “will Trump win” was entirely out of wack with polling, Terry is the actually opposite of correct. At least when it comes to estimating Trump’s chances, if you take the entirity of polling and analysis of them in aggregate, the media generally were highly biased in terms of *pumping up* Trump’s chances, to the extent that even people very opposed to Trump were convinced that he was going to win.

          If there was any malignant pressure on pollsters, it would be to exaggerate the closeness of the race. Nate Silver for example faced many such accusations.

  4. Apologies for beating my own drum repeatedly, but as I mentioned on another thread, it was possible to do better with the polls, if you assumed state level errors this year would be similar to what we saw in 2016:

    https://thehumeanpredicament.substack.com/p/presidential-election-forecast

    Counting only states with >99% of precincts reporting, here is how things looked last night (predicting Biden’s share of the 2-party vote):

    Bias: +1.1% (me), +2.6% (Gelman), +2.7% (Silver)
    RMSE: 2.4% (me), 2.9% (Gelman), 3.2% (Silver)
    Mean AE: 1.8% (me), 2.6% (Gelman), 2.7% (Silver)
    Median AE: 1.1% (me), 3.0% (Gelman), 2.8% (Silver)
    Max AE: 6.8% (me), 5.2% (Gelman), 6.2% (Silver)

      • Yes, it’s not an algorithm that can be applied blindly to other elections. But the goal was to predict the election correctly. I came up with an approach, plugged in polling data from 2016 and 2020, as well as the 2016 results, and published what came out. (I have a program that reproduces it from these data that I’ll share when I get around to formatting it more nicely). Not claiming to be an expert forecaster, just that a reasonable hypothesis one might have had before the election (see my reply to Andrew below) could make a simplistic approach more accurate than the predictions of pros like Silver and Gelman. I think that’s noteworthy, even if it doesn’t mean I’m the new Nate Silver!

  5. > Based on news reports so far, we’re guessing that the votes for Congress were closer to 50/50 with only a very narrow edge for the Democrats.

    Which is an interesting contrast to other recent elections where even when pub congressional candidates gained seats even as they were outperformed by dem congressional candidates.

    So would it be accurate to say that this year, Trump underperformed pubz overall?

  6. Thomas, Aleph:

    Yes, if you were to assume ahead of time that the polling errors from 2020 were the same as in 2016, you could do better. In retrospect, that would’ve been a good call. But during the campaign we assumed that the pollsters had fixed these problems on average. Also, the polling errors in 2016 and 2020 could be coming from different sources. In 2020, I’m guessing that differential turnout was a big issue that wasn’t happening in 2016.

    • Andrew:

      My own hypothesis prior to the election was that the correction made by pollsters was misspecified. I’m thinking that Trump supporters who we have data on (via pre-election and post-election polls) are fundamentally different in unobserved ways from those who we don’t have data on. This means that weighting for education, which seemed to be the strongest predictor of Trump support, is sort of like a misspecified model-based adjustment. If that’s right, then we wouldn’t expect errors this year to be systematically different from what we saw last time, though they of course won’t be identical. My approach was to use a simple regression model to average the 2016 polls, smooth the observed errors from that model’s predictions (essentially averaging state-specific errors with the average error across states), and then apply that smoothed error-correction to the predictions of the same model fitted to the 2020 polls. It looks like even that approach underestimated Trump, so it’s clearly not the whole story, but besides a small Biden bias it looks like these predictions are mostly spot on, for what that’s worth.

      • That is certainly a methodology worth considering. But I think it is wrong to use its relative accuracy in this election as an indication that it was a better methodology. It is the same problem that people exhibit in NHST when the reject the null: they take that as evidence supporting their own particular alternative hypothesis. There were many methodologies that could have produced more accurate results in this election – how can we choose between them? I’d say they would need some kind of “replication.” In this case, true replication is not possible, but at least the methodology should hold up as well when applied to earlier elections.

        • Dale:

          The replication here would be the predictions across the various states. These results are correlated, but not perfectly so, which means there effectively are multiple independent observations against which this is being validated. I agree, however, that because of the high correlations, and low generalizability, the power is presumably pretty low, so maybe our performances are statistically indistinguishable when you take all of that into account. And as I said in reply to Zhou Fang above, this is less a methodology that can be applied to other elections than a test of a hypothesis about polling and Trump voters. This is a noisy test of that hypothesis, I’ll grant that much, but I think it’s informative.

  7. > Setting aside Illinois and Hawaii, we could’ve forecast the election pretty much as well by taking Clinton’s vote [ and adding 2 percentage points for every state ], as by doing all our poll analysis

    Is the part in brackets required? Looking at the chart it seems that simply taking the previous results is not worse than your point estimates.

  8. I don’t know if I buy this explanation.

    Perhaps one of the reasons the polls were taken for granted is that they were consistent with Trump’s approval rating. And that approval rating is consistent with many other things we see and hear about Trump, right down to screaming outrage and impeachment.

    Maybe Trump supporters are “shy” about approval ratings as well? It’s believable that the approval rating surveys are subject to the same errors as the polls, but it really would be shocking if Trump’s approval rating was secretly at 47% or 48% and people were simply hiding that from the surveyors. I think a lot of protesters would be shocked to find that too.

    It makes more sense that a lot of people just changed their minds at the last minute when faced with the reality of the choices. Trump’s approval rating really is low. A lot of people really *didn’t* want to vote for him. But in the end they saw worse consequences by voting for Biden and Democrats.

    That explanation is consistent with the data: Republicans overall did *better* than Trump – they were supposed to be drug down by him but instead they outperformed him. Frankly that demands a lot more explanation than the polls. In effect, many people were moved to support the Republican agenda but rejected Trump.

    • Jim –

      > Maybe Trump supporters are “shy” about approval ratings as well? It’s believable that the approval rating surveys are subject to the same errors as the polls, but it really would be shocking if Trump’s approval rating was secretly at 47% or 48% and people were simply hiding that from the surveyors. I think a lot of protesters would be shocked to find that too.

      If we think of a “silent Republican voter” as opposed to a “shy Trump voter,” it would mean a differentially higher non-response among people who vote Pub.

      And indeed, the pubz overall outperformed the polling just as Trump did.

      That some group would likely show a differentially higher non-response to favorability or job performance polling.

      • “If we think of a “silent Republican voter” as opposed to a “shy Trump voter,” it would mean a differentially higher non-response among people who vote Pub”

        Sure, and that’s fine, but how do you explain that? Why are there so many “silent” Republicans? Why are they hiding?

        It’s just as easy and believable to say that people in the center swung the other way at the last minute. I understand why pollsters and statisticians don’t like that explanation. It brings the full error range into the realm of possibility and thus implies this kind of forecasting and modelling is less reliable than advertised. But the full error range really is in the realm of the possible. That’s why it’s there – because that level of variation can’t be constrained.

        Perhaps the real question isn’t about the accuracy of the mean, but about the quoted error range. Maybe that was too small.

        • Jim:

          Our model does allow for changes in opinion during the last week of the campaign. You might want to allow larger changes; that’s a particular modeling choice. Regarding the error range: adding more uncertainty to the forecast wouldn’t really help. The Fivethirtyeight team added a huge amount of uncertainty, including giving Biden a 6% chance of winning South Dakota and giving Trump a chance to win 58% of the two-party vote in Florida, among other things, and that didn’t help them out either. Increasing variance is no substitute for shifting the mean.

        • Yes I realize additional uncertainty doesn’t necessarily change the mean but at the very least it would expand the error intervals, which should imply more room for error.

    • >>Frankly that demands a lot more explanation than the polls.

      Actually, I think this is not terribly strange, as this is a chaotic time where people want stability, as opposed to 2016 where people were leaning anti-establishment, and Trump is a destabilizing force rather than a stabilizing one.

      Also, Trump is not much of a classical Republican.

      For example, I voted for Biden, and for a number of down-ballot Republicans – and both of these were essentially votes for “stability” (I live in a red county of a red state and am generally more or less happy with my state/local government)

  9. I’m going to mildly disagree with your self-criticism, if it was that, of saying you could have adjusted for larger poll error. Yes, but when data sources are bad, the attention should be on the methodologies used by pollsters and how those can be improved, rather than on adding error at this analysis level.

    I was greatly surprised by the polling failure because it has been 4 years and pollsters are professionals, and they have had some access to not only the amount of mail voting but it’s composition. And there’s money put into polling, so it’s rational to think competition to get more accurate polls would generate more accurate polls.

    Then again, it may turn out over the next few election cycles to have been a Trump effect. For whatever reason. I include an effect often noted in social science, that expectations shape results. They are sometimes hacked to get results, but even then that’s generally to match results to expectations. Pollster methods and models may show that.

    I find it hard to believe Trump received the 2nd highest number of votes ever.

    • >>I find it hard to believe Trump received the 2nd highest number of votes ever.

      It’s surprising in the sense that Trump is IMO clearly not qualified to lead the nation – but it’s not surprising, in the sense that the nation is fairly closely divided and population growth + high turnout = record number of votes cast.

      • +1.

        I believe that all US voters should be congratulated for participating in the election despite the difficulties many of them faced (long queues, risk of catching COVID-19 etc) and making the turnout so high.

  10. “In effect, many people were moved to support the Republican agenda but rejected Trump.”

    Now I am wondering how much the virus had to do with it. We have two relationships with the pandemic, personal and social. In our personal relationship, we may get sick and we may lose our jobs, or we may be forced to work when we feel it is unsafe. But we personally lose nothing by OTHER people going back to work, and in fact having others keep the economy running is the best outcome from a game theory perspective.

    If Trump does any one thing, it is getting a cohort of Americans to think only about themselves. There is no “what can you do for your country?,” there is only “what can your country do for you?” So I think pandemic fatigue had a major effect in the last few weeks, when folks heard Biden’s plan and just started dreading two more years of partial shutdown.

    But then since the same thing kind of happened last time, maybe the evidence does not stack up so well.

    • ‘There is no “what can you do for your country?,” there is only “what can your country do for you?” ‘

      Too funny, just last night I came across an old vid of Ayn Rand on the Johnny Carson show. She believed that acting in one’s own *rational* self interest is the highest moral calling and that if all people did so society would be better off.

    • Matt,
      NPR has a story today showing that Trump tended to better on 2020 than in 2016 in counties that had more Covid-19: https://www.npr.org/sections/health-shots/2020/11/06/930897912/many-places-hard-hit-by-covid-19-leaned-more-toward-trump-in-2020-than-2016.

      It seems to me that Covid-19 could have had two effects: (1) some people want so badly for life to got back to normal that Trump’s magical thinking was attractive, and (2) the Democrats did much less face-to-face work to get out the vote than would have been the case otherwise.

      • “1) some people want so badly for life to got back to normal that Trump’s magical thinking was attractive,”

        Or ,:

        1b) that business and work shutdowns are really hurting people and they don’t want more.

        • Well, polling ain’t looking that great theses days but what you missed with (1b) is that polling shows that a lot of people favor government interventions to slow the spread of the virus. Maybe polling isn’t that trustworthy, bur I have no particular reason to implicitly trust your interpretation of public sentiment. It could be correct, but it’s pretty much guaranteed that it’s influenced by your ideological biases.

        • Exactly. Commerce is suffering. Reach a consensus on what to do to allow for commerce, while at the same time mitigate disease transmission — e.g. conduct what commerce can be done outside; e.g. invent UV-C treatments for HVAC. There are practical problems that require practical solutions. If you have an earthquake and it destroys all the buildings in your town, you put up temporary structures that don’t get shaken down. You don’t get into arguments about whether glass cathedrals should or should not be the order of the day. Let the engineers do their damn work!

      • This seems like a classic case of reverse causation at play.

        The story should really be that places where Trump retained or gained popularity since 2016 were worse at containing Covid19.

  11. > ” So I think pandemic fatigue had a major effect in the last few weeks, when folks heard Biden’s plan and just started dreading two more years of partial shutdown.

    Overall, polling showed thst people were generally in favor of government interventions to mm mitigate the pandemic. But there is some evidence of such interventions becoming less popular in the weeks leading up to the election.

    But it could be a mistake to assume direction of cauality there. Partisan orientation could drive perspectives more strongly as the election approaches just as much as the cauality could win in the other direction.

    We often assume that values or policy preferences drive partisan orientation when actually it can be the other way around.

    • +1. For all the talk on this blog about forking paths and adding explanations after the fact, certainly one possible explanation is that areas with higher COVID rates voted more strongly for Trump from COVID fatigue or the like, but the other converse explanation is that areas that have strong Trump backing listened to him, have taken few or any precautions, and therefore have a higher rate of COVID infection. I do not claim that I know which is true, just that after the fact either explanation can fit the data, depending on what it is you have decided you want to prove. Just why you need to be very careful in post-hoc analyses, you find one wiggle, you look for another wiggle, and then say that was a cause. But there are a lot of wiggles out there.

  12. “only the second time this century that a presidential candidate has received more than 52% of the two-party vote.”

    LOL – there have been only 5 or 6 elections this century. It is also only the second time this millennium!

    • Kl:

      Lol all you want, I don’t care. The elections since 2000 are a natural dividing point. It has nothing to do with it being a new millennium or any other quirk of the calendar numbering system.

      As I wrote, in the context of American history, a presidential election margin 52-48 or 53-47 is a narrow win. In the context of the recent era of political polarization, 52-48 or 53-47 is a pretty big win. Gore, Bush, Obama (in his second campaign), and Clinton all won with less than 52% of the two-party vote. National elections since 2000 have all been very close by longer-term historical standards.

        • Yeah, it’s a funny list. “Gore, Bush, Obama, Clinton” should be “Bush, Obama, Trump”. Well, two outta three ain’t bad.

        • Phil:

          No, I was specifically talking about the margin in the popular vote. My point is that in the history of the U.S., 52-48 or 53-47 is a close margin for the national vote, but since 2000 it’s a large margin.

        • Rm:

          There have been very close presidential elections several times in American history: during the 1880s, then in 1960/1968/1976, then 2000, 2004, and 2016. 2012 was pretty close too. Looking at all of American history, a candidate winning 52% or 53% of the two-party vote is a close win. Not the closest, but still close. Looking at the period since 2000, it’s a large win.

        • Great article as usual, and I’m happy Biden won. But there have been only three presidents since 2000; Bush and Trump were (initially) elected with negative vote margins!

          There is a current argument about whether the close election rejects the extreme left, or whether the woke movement energized the party and helped turnout.

          These close elections are great for T.V. ratings!

  13. if you have a prior on vote share / turnout that’s essentially even between both parties, and I observe polls that favor one side by 10 but I know there could be measurement; should / does that mean that my best guess should be one side up by 7 or so because if there is measurement error (and there usually is), it is particularly unlikely to go in the direction of the party that is up in the polls? (i.e. measurement error is not mean zero)

    Suppose the only thing that varies / is unknown from year to year is the turnout projection used by the pollsters. If they report one side +10, I know that they predict turnout to be on the extreme end for that party. So it is more likely the pollsters over-predicted turnout for the side that is up.

    This may come down to whether we have classical measurement error or non-classical (“complicated”) measurement error. To figure out what type of measurement error we have, we need to know what the goal of the pollsters is. Do they report the mean of survey responses? (not quite probably) Do the report E(vote share | survey responses) using some kind of prior of there own? Maybe, but they may also be reporting the median or mode. I don’t know what there incentives are. But I think there’s a stats literature suggesting it matters (and it matters what their priors are), e.g. Hyslop & Imbens (2001); I’m sure you are much more familiar with this literature than I am, and would be curious to hear your thoughts on how / if it connects to (a) the forecasting problem in general, and (b) what might have gone wrong this time specifically. Or did the models you know from this year already take all this into account?

    • Mp:

      Our model does include non-sampling error (see here). That’s why our intervals are as wide as they are. The key question is whether we could’ve addressed possible polling errors by shifting the mean rather than just adding uncertainty.

      • Maybe there is a margin shrinkage that could be applied?
        Reputable predictions suggest Biden will win (90% chance), so Biden voters are less likely to invest in voting, so actual chance is 80%?

        • Mbn:

          It seems that the Republicans did better than Democrats in the turnout race this year, but I don’t attribute that to the forecast; I attribute that to campaigning and coronavirus.

  14. > As we’ve discussed elsewhere, we can’t be sure why the polls were off by so much, but our guess is a mix of differential nonresponse (Republicans being less likely than Democrats to answer, even after adjusting for demographics and previous vote) and differential turnout arising from on-the-ground voter registration and mobilization by Republicans (not matched by Democrats because of the coronavirus) and maybe Republicans being more motivated to go vote on election day in response to reports of 100 million early votes.

    I think these are reasonable guesses, although I don’t think that history will find much evidence that, prior to the election, Trump supporters were explictly less-motivated to vote – from a poller’s perspective. As you also state, ultimately Biden secured a greater share of the final vote than his predecessors of recent elections. It was a high turnout election, and it seems there weren’t that many more likely-voting Democrats hiding in the woodworks.

    I go back to my questions from the other day, which I’ll boil down to the following single question:
    – For every 10 actual Biden voters who would also wilfully and truthfully participate in an independent, credible poll, how many Trump voters would do the same? I don’t think it’s close to being equal, and obviously I think it’s considerably lower. If I’d be a Trump voter deep into the QAnon / Breitbart base, I’d look at a poll that would eventually be recognized by 538 or the Economist as a symbol of the establishment. I don’t know how exactly I’d respond to the poll as Trump voter, but I know it’s very likely I wouldn’t respond with the same likely rationally as a Biden supporter.

    In two elections, there is a systemic error of an underreporting of the number of Trump voters and/or their intentions. I know it’s conjecture to suggest that inaccurate polling of ‘shy’ or ‘dishonest’ or ‘disconnected’ Trump voters are the *primary* source of this error, but in my view, I don’t think this possibility can continue to be ruled out (as it had been before the election).

  15. Andrew and Elliott:

    Thanks for continuing to examine model calibration.

    What you’ve posted for 2020 is encouraging, though vote counting may not be complete enough yet (?) to be sure.

    Coverage isn’t too bad, but that’s a matter of luck.. These 50 outcomes are highly correlated, so you can’t expect to measure calibration as if they were 50 independent forecasts.

    For this reasons, calibration checks seem incomplete without 2008, 2012, and 2016 calibration analyses. Including all years of available data at least allows four elections that you can look for patterns across. By comparison, any one election (including 2020!) is more likely to be misleading about actual model calibration.

    I think this is also why it would be useful to provide the p-value the model assigns to the overall set of state election vote shares that occurred in 2020 — and similar p-values for 2008, 2012, and 2016. Those p-values could help you investigate overall calibration without the obstacles posed by the nonindependence of state results.

    Last, that graph comparing 2016 and 2020 polling errors is insightful! Thanks for posting it.

  16. This is very easy to explain and I have been saying this for awhile. The race was never as close as polls shown for the same reason Clinton was in trouble in 2016. Anybody who would eventually vote for Biden was not “undecided”. Trump was always going to get nearly 100% of the undecided vote. Polls showing Biden leading in PA 50% to 44% was NEVER a 6 point lead. It was a tie because Trump was going to get nearly all of that undecided vote.

    • “It was a tie because Trump was going to get nearly all of that undecided vote.”

      Exactly. Trump is an easy candidate to be undecided about. And I suspect a lot of people were “decided” about Trump weren’t very happy about it.

      • It doesn’t matter whether the “answer” is easy or “hard.” It matters whether it’s right or wrong. Overly complex mousetraps are the most likely to fail, that’s why simple mouse traps are still the most common kind.

        • Is there any basis to this “easy answer” other than gut instinct and anecdotes?

          My dismissal of easy answers is not about seeking complexity, it’s about doing hard work to obtain the true and self-criticise. You’re doing the equivalent of finding a dead mouse and assuming this proves whatever the nearest thing to it is, is a mousetrap.

          A simple bit of consideration shows that
          > Polls showing Biden leading in PA 50% to 44% was NEVER a 6 point lead. It was a tie because Trump was going to get nearly all of that undecided vote.

          is complete rubbish. The bulk of that 6%… and a fair chunk of that 50 and 44%s, will be non-voters. Despite what people seem to think, a heck of a lot of people were undecided enough by election day to simply not vote.

        • I mean, polling on whether they plan to vote basically never *underestimates* turnout. That would be an useful starting point if you want to reason about these matters.

  17. Andrew,

    You mentioned differential turnout as one of the potential explanations for the polls being off. In your model, how did you estimate lost votes due to vote-by-mail challenges like signature mismatches and undelivered/late ballots?

    Vote-by-mail was new in a lot of states, and both the USPS situation and huge effort to getting people vote early were unprecedented, so I could certainly imagine these estimates being off by a few percent … which would be enough to account for a lot the polling error.

    • I think this is a very important point. So-called “mismatched signatures”, challenged provisional votes, and undelivered mail-ins all represent a difference between the intended vote and the counted vote. Polls only measure the intended vote. Further, challenges are more vigorously pursued in battleground states. I suspect the Democrats would have taken about 80% of these intended but uncounted votes which would account for between half to one percent of the 2.5 percent polling difference.

  18. How are the polls samples drawn? Are they random state-wide based? How is the data collected? Don’t they might have bias in favor of population living in big cities.

    It seems pretty clear that the largest differences (beyond black vs. white voters) are between big and small cities, or perhaps even between urban and rural areas. Intuitively, I would say that polls probably are not measuring very well voting intetions of small cites population.

  19. Isn’t the question not whether the error bars were wide enough, but whether the forecast was actually informative? The 2016 baseline is likely to be closer in most states (and almost every swing state). Even the national popular votes polls seems to have been off by quite a large margin. So someone looking at polls and everything was worse-informed than someone looking just at the 2016 results. And someone looking only at 2016 and fundamentals would arguably have gotten this election spot-on.

    It seems like one possible conclusion is that, in an era of very low response rates, accurate polling is just not possible. The iPhone can now automatically silence calls from unknown numbers, if that becomes widespread then that seems apocalyptic for polling by phone at least. Maybe we just have to accept that it’s very hard to predict elections. And I think this really calls into question whether polling on other things is at all accurate.

    • Matt,

      The apocalypse for phone surveys isn’t coming, it happened by stages over the last 20 years or so. What percentage of the voting-eligible public do you think is ever, under any circumstances going to reachable by a RDD political poll nowadays? I doubt it’s anywhere near 10%, probably much lower than that. Belief that a clever statistician can “correct” data from a self-selected single-digit percentage of voters is just magical thinking.

      It looks to me like the only way to make a meaningful forecast from phone polling data is to layer assumption after assumption on top of “fundamentals” and other non-poll-based information.

      It’s like that parable about making soup from a nail. You either have to make the soup using a bunch of non-nail ingredients or you end up drinking hot water that takes vaguely like a nail.

      • I agree, it’s been a long time coming. Per https://www.pewresearch.org/fact-tank/2019/02/27/response-rates-in-telephone-surveys-have-resumed-their-decline/, it was about 6% in 2019, and someone told me that it was even lower this year.

        Per this, it was about 15% in 2008. I can believe that it is possible to extract a lot of information from a 15% response rate. I’m sure pollsters will say they can get information out of a 6% response rate, although I don’t believe them. But if it declines further (presumably eventually auto-silencing will be on by default), then it becomes even more hopeless.

        Personally, I think I stopped answering calls from unknown numbers around 2014. That’s when I remember a huge wave of spam calls. Now I’m not sure there is anyone I know who would pick up a call from an unknown number. So the people who do pick up calls from unknown numbers must be extremely weird.

  20. Do we have intuition or information on whether/how much communication of a forecast affects voter decisions? Perhaps someone seeing their side fall behind prompts them to vote, or seeing a likely win and decides their vote isn’t needed, thus changing the outcome.

  21. I keep going back to an observation I made months ago. There was a very significant increase in Biden’s chances in the forecast over a relatively short period of time. I said at the time that I thought that was an indication of a problem with the forecast. It seems to me to be unrealistic to think that people’s views on something so basic to their identity would shift that much even over a long period of time, let alone a short period of time.

    I’m not sure if that’s an underestimation of uncertainty or an underestimation of certainty (the certainty that in the end people would come back home and the vote would be closer to 50/50 than the polls indicated).

    If you had asked me 8 months ago to make a prediction I would have predicted pretty much EXACTLY what happened. I would have said no matter what happens with the panic Trump supporters will convince themselves that he did the exactly correct things to deal with Covid.

    Then I looked at the polling and looked at the outcomes from the pandemic and fooled myself.

    • I totally agree. This is a extremely polarized nation and it is clear to me there are no events that will keep the polarization outside of an extremely narrow range. 52-48 +/-1% is probably the maximum range. However that 1% range is amplified 100x by the electoral college. Biden will probably win by 5%, but all it would take for him to lose the EC is going from 5% margin to 4.5% margin.

      • Yes. And it’s pretty amazing that demz have gotten more votes for prez in 7 out of 8 of the last presidential elections yet pubz have so much control over policy. The minoritarian aspect is striking and I have to wonder if it’s sustainable much longer. Pubz will say that “democracy = mob rule” but at some point if the imbalance gets too big it might have to change. It is intersting, however, that the imbalance between congressional race outcomes and share of the vote may be closer to equal this year than in the more recent elections. Even though I don’t like that pubz did relatively well, in some sense it’s more healthy from a “democracy” angle.

  22. > “had Biden done a little bit better, his results would’ve been inside all 50 of the intervals; had he done a bit worse, many of the state intervals would’ve missed the mark. These 50 outcomes are highly correlated, so you can’t expect to measure calibration as if they were 50 independent forecasts.”

    This is related to the general idea of model check in hierarchal models. To be clear, there are two levels of correlations: (1) the correlation of predictions ( corr(simulation draws), i.e. “50 outcomes are highly correlated”), and (2) the correlation of prediction errors (corr( predict- obs )). The correlation of the prediction errors is not necessarily a result of correlations in predictions. A mean-filed but highly-one-sided-biased prediction can also produce highly correlated errors. The bias term may come from a nationwide trend that is not modeled in the forecast.

    The correlations in errors makes model evaluation challenging. The immediate consequence is much smaller effective sample size to ensure the validity of empirical estimation of any predictive scores or calibration graph. That is the “luck” part.

Leave a Reply to Joshua Cancel reply

Your email address will not be published. Required fields are marked *