So, what’s with that claim that Biden has a 96% chance of winning? (some thoughts with Josh Miller)

As indicated above, our model gives Joe Biden a 99+% chance of receiving more votes than Donald Trump and a 96% chance of winning in the electoral college.

Michael Wiebe wrote in to ask:

Your Economist model currently says that Biden has a 96% chance of winning the electoral college. How should we think about uncertainty here? Should we bootstrap by sampling (with replacement) from the underlying polling data?

I replied:

Our model is described here. The uncertainties come from our estimate of historical polling errors.

Then again, our backfitted predictions seem to be overconfident . . . so I don’t fully believe our numbers either.

So what to do? How to think about these probabilities? I had a long conversation with Josh Miller and here’s where we are on this:

Our model was not designed to make sense in the tails; we didn’t think through what would happen if the vote share needed for Trump to win became a tail event, or consider the fact that this was inevitable if Biden were to stay steady in the polls.

If we would have left room for some ad hoc adjustments to our model to account for the possibility of lower turnout due to coronavirus (a fear from a few months ago) or ballots being thrown out (a possibility being talked about now), that would not have been any more arbitrary than many of the settings that already need to be made when setting up any predictive model (except that the timing of the adjustment would happen later).

Three good reasons not to believe our numbers

1. Our forecast had calibration problems with state vote forecasts in the past. Nothing that seemed to mess up the national forecast, but still.

2. Other people are giving lower forecasts. Fivethirtyeight, the prediction markets, everybody. Now, I have problems with each of these: the Fivethirtyeight has some deeply wacky scenarios, and the prediction markets will give you 34-1 odds that Biden wins California (“lay” 1.03 here), which would be a fair bet if Trump had a 3% chance of winning California, which is approximately 3 percentage points higher than it should be. That said, other people’s views must count for something, right?

3. Big things not in the model can happen, whether these be surprises in voter turnout, unexpected polling errors, major vote suppression . . . not to mention bugs in our code and conceptual errors in our model.

I’m reminded of the finding in psychology that it’s difficult to assign calibrated uncertainty intervals (Alpert and Raiffa, “A progress report on the training of probability assessors,” 1982) and the discussion in Mosteller and Wallace (1962) of extremely low probabilities and the influence of “outrageous events” in their analysis of the Federalist Papers.

There are other issues with our model—our fundamentals-based forecast is not based on a generative model (for example, in June we predict the election based on the fundamentals in June, not on predicted fundamentals in November), and we think our state-level correlation model is basically reasonable but it’s definitely a hack and I’m sure that if we were to look carefully enough, we’d find some predictions and comparisons from our model that don’t look right. Given the small number of previous elections, some hackiness is unavoidable, but still.

If we think Pr(Biden gets more votes) to be less than 99% and we think Pr(Biden wins electoral college) to be less than 96%, how do we get there?

Of course we could just adjust the numbers. But, even if you know where you want your inferences to be, a model is still useful in helping you incorporate new data. Recall that a big motivation for fancy forecasts is that simple poll aggregation has problems because polls are of uneven quality. We take it a step further and integrate state and national polls. Ultimately the inference is only as good as your assumptions, but a model does allow you to fold in new data in a reasonable way. If you start with poll aggregation and then consider state and national polls and then model polling biases and changes in state opinion over time . . . then pretty soon you have a model.

Also, speaking more generally from a statistical workflow perspective, if you have a model and it gives predictions that don’t seem to make sense, this is an opportunity to learn and do better. It’s not time to give up!

OK, so there are five ways I can think of changing our model to lower those probabilities:

1. Shift the fundamentals-based forecast. In this particular election it didn’t seem to matter much because it’s consistent with the polls, but if there were a discrepancy—for example, if the fundamentals-based model put Biden at 54% of the national vote but the polls had him at 50%—then this would make a difference in our prediction. Our model allows for systematic polling error, so even on election day the fundamentals-based prediction provides some information.

So one way to lower Biden’s win probability is to say that the fundamentals don’t favor him as much as we thought they did. I don’t really want to do this adjustment because I don’t see any good reason for saying that. We could increase the uncertainties in the fundamentals-based forecast, and that will lower Biden’s win probability a bit, but not much.

2. Change what polls you include and how you adjust for house effects and state/national polling discrepancies. That can make a small difference. For example, right now we forecast Biden at 51.5% of the two-party vote in Florida, while Fivethirtyeight gives him 50.9%, so pretty similar but not identical. Again, it’s hard for me to see a good reason for changing the polls that we include, especially given that we anchor our forecasts to the surveys that adjust for partisanship of respondents. By comparison, the betting markets have Trump as a slight favorite in Florida, which I’d attribute to bettors performing some sort of implicit averaging of current polls and the 2016 election results.

3. Widen the uncertainty in the center of the forecast distribution, let’s say the forecast 50% or 80% interval. These uncertainties are the consequence of our assumptions about the possible sizes of polling errors, opinion swings during the campaign, and uncertainty in whose votes are counted. In the past we would’ve called this uncertainty in turnout, but if someone turns out to vote and then the vote is thrown out, then that’s not really a turnout issue; it’s something else. Anyway, all these uncertainties, as well as their correlations between states, determine the rough uncertainty in the forecast distribution for each state and also the national popular and electoral vote. There’s still a judgment call on how wide to make these intervals—I don’t want to end up saying that Trump might get 58% of the vote in Florida—but there’s some room for maneuver here. I’ll get back to that point later on in our post.

4. As discussed in item 1 here, there are the tails of the forecast. You can keep our national forecast of 54.3% Biden and the 50% interval of [53.3%, 55.3%], or whatever it is, but still make the tails as wide as you want, for example if you’d like a 1% chance that Biden gets less than 48% or more than 60% of the vote, which are currently way outside our intervals but, hey, anything could happen.

5. We can add a directional shift to our model, reflecting all the talk surrounding vote counting, along with the fact that the Republican party controls the federal government and many state legislatures. My colleagues have done some work along these lines but it hasn’t gone into our forecast.

Regarding this last point, Sam Weiss took a look at poll-based forecasts and prediction markets for the state election results and found some differences that were correlated with partisan control of the state. This does not not not mean that the prediction markets are integrating all relevant information—again, remember those ridiculous California odds—but people are reacting to the news in some way.

Listing the above options doesn’t in itself resolve any of our problems; I just think it’s helpful to see the levers that we have at our disposal. If we think that Biden’s win probabilities are too high, we can lower them by lowering his fundamentals-based forecast (for example by changing how we measure economic performance) or by lowering his poll position based on a judgment of systematic polling error (for example by asserting that we think the polls are biased against Trump, even those polls that adjust for the partisan composition of their respondents) or by lowering his poll position because of our assessment of turnout or of the possibility that Republican-friendly courts will throw out hundreds of thousands of votes or by widening the 50% and 80% uncertainty intervals based on the judgment that unexpected things can happen in either direction (for example, higher Democratic turnout in response to threats of vote suppression) or by widening the far tails beyond the normal distribution based on the general principle (which I agree with) that we should allow for extreme events.
This op-ed by Zeynep Tufekci has a good summary of election forecasts and their limitations.

Another way of saying the bit about the extreme tails is that there’s no reason to believe normally-distributed tails; they’re just a default assumption which we kept in our model because at the time we were building the model, these tails didn’t matter much for the forecasts we were focusing on.

The challenge, as always, is in the details. But let me divide the modeling decisions into two parts: The decisions that could’ve been made ahead to time, and decisions based on information occurring during the campaign.

Decisions that could’ve been made before the campaign began: Fundamentals-based forecast and its uncertainty; scale and covariance matrix of historical polling errors and opinion shifts during the campaign; behavior of tails.

Decisions based on information occurring during the campaign. Another way of saying this is, information not encoded in the fundamentals and the polls. We’re not so worried about conventional news events, because, even though they’re not immediately reflected in the fundamentals and the polls, they should eventually get there. If a candidate has some unexpected bad news causing some shift in support or motivating his opponent’s voters or demoralizing his own, that should eventually show up in the polls—not perfectly, because polls aren’t perfect, especially in their projections of turnout—but pretty much, especially in our era of polarization where we don’t expect big shifts anyway. For example, I don’t think people are expecting the recent news about the president and the Turkish bank to change many voters’ decisions.

But there is some news that would’ve caused us to alter our forecasts: for example if one of the candidates were to die or become incapacitated, or if major third-party candidates had arisen. What about the recent news of possibly record turnout and threats of unprecedented vote suppression?

The key question is whether to put such unexpected events in the “error term” or whether to estimate their effects and shift the forecast.

With our 96% and 99%+ forecasts, I feel like it’s a little of all these things.

Most easily, we could make these numbers less extreme by allowing longer tails, basically saying that in any election our forecast could be off by 5 or 6 percentage points, say, just cos anything can happen: war or peace can break out a week before the election (remember Kissinger in 1968!), a pandemic could happen, candidates can die, courts can issue unexpected rulings, unexpected polling errors of a form not seen in the past, etc. It’s hard to assign probabilities to all these things, but we can crudely estimate that the probability of something unique and unanticipated happening is closer to 10% than 1%. After all, in this year alone we had the pandemic and a change in the Supreme Court, in other years we’ve had fluid situations in international relations, etc. Not all these events will necessarily cause big unexpected changes in the vote, but they’re not necessarily accounted for in the error term fit to past data. But really that’s the easiest part of the equation, to make the tails wider so our 96% and 99+% become 90% and 95% or whatever.

Maybe also our 50% and 80% intervals should be wider. That’s harder for me to judge, and this gets us into decision-analysis territory: do we want “conservative” inferences so that our 50% intervals contain the truth more than half the time? See section 3.2 here. My thought right now is that if we fix the tails, then there’s less of a pressure to be overconfident with central intervals. But ultimately these are judgment calls. I don’t buy Fivethirtyeight giving Biden a 6% chance of winning South Dakota (see graphs here), but at some point you just have to put your numbers out there and accept they won’t be perfect. We’re not bookies who are in the business of offering betting odds.

The hardest call is whether to directionally shift the predictions based on information not in the fundamentals and the polls. Both we at the Economist and the team at Fivethirtyeight have forecasts that are set up to run on their own: we make some judgments about what polls to include and we reserve the right to fix bugs and conceptual problems with our model (as we did once; see August 5th update here). But what about allowing the model to be adjusted for news that won’t be reflected in the fundamentals or the polls, for example the possibility of votes not being fully counted? I’m kinda thinking a forecast should allow for this, but I’m not sure how to do it in practice.

As noted above, I think that with wider central intervals and wider tails we could lower that Biden win probability from 96% to 90% or maybe 80%. But, given what the polls say now, to get it much lower than that you’d need a directional shift, something asymmetrical, whether it comes from the possibility of vote suppression, or turnout, or problems with survey responses, or differential nonresponse not captured by partisanship adjustment, or something else I’m forgetting right now. But I don’t think it would be enough just to say that anything can happen. “Anything can happen” starting with Biden at 54% will lead to a high Biden win probability now matter how you slice it. For example, suppose you start with a Biden forecast at 54% and give a standard error of 3 percentage points, which has gotta be too much—it yields a 95% interval of [0.48, 0.60] for his two-party vote share, and nobody thinks he’s gonna get 48% or 60%. Anyway, start with that and Biden still has a 78% chance of winning (or 75% using the t_3 distribution). To get that probability down below 80%, you’re gonna need to shift the mean estimate, which implies some directional information.

Anticipating problems with a forecast

From a strategic point of view, it probably would’ve been better for us to keep our mouths shut on this one: it’s too late for us to change our model (even if we were sure we wanted to), so in the unlikely-but-not-impossible event that Biden loses the election without massive vote suppression, we’ll look bad, and this post won’t really help; and in the more likely event that the election goes as predicted, our forecast will look better in retrospect had we not hedged it like this.

But the thing that really bugs me is that we could’ve seen this coming. Back in July we were concerned that our model was giving Biden a 99% chance of winning the most votes (and a 91% chance of winning the electoral vote). 99% just seemed too high, and this motivated us to go inside our model and look at it more carefully. We found actual bugs in the code along with conceptual errors in how we set up some of the error terms and also we rethought some of the details of our model.

Just to be clear: It’s not that our model said 99% and we didn’t believe it so we fiddled with the model until we got a lower probability. It’s that the implausible (to us) probabilistic forecast gave us the impetus to inspect our model. If we hadn’t found any problems with the model, I think we would’ve kept it as is.

Anyway, Biden was sitting there at 54% of the two-party vote in the polls and in our forecast—the forecast combines the polls with a fundamentals-based election prediction, but it just happened that our fundamentals-based prediction was at around 54% also—and this mapped to an approx 99% chance of him winning the popular vote. Fixing our model didn’t appreciably change our point forecast, but it increased the uncertainty in the national popular vote and electoral college, lowering Biden’s assessed win probability.

Fine. But here’s the mistake. We didn’t play it forward. As I wrote a few days ago:

The polls had been stable in support for Biden and we were anticipating further stability, so we could’ve simulated a few months of future data, then fit our model to that, and seen the uncertainty interval for Biden’s vote share gradually narrow until until losing was clearly outside the range.

So why did we make this mistake (if mistake it was)? I’ve been pondering this, and I think the problem was that we set up our model by testing it on 2008, 2012, and 2016. 2008 wasn’t close and so we didn’t focus so much on our model’s Pr(Obama win), but 2012 and 2016 were close in the polls, so our model gave reasonable-seeming uncertainties. When fitting to 2016, our model predicted something like a 70% probability of Clinton win, and we thought, sure, that sounds right. And, indeed, given the information available to our model it really had to make Clinton the favorite: she was leading in the national polls as well as in key swing state polls. The model didn’t “know” that the state polls in Wisconsin etc. were overstating Clinton’s vote share; all we told the model was that the polls could be off. The point estimate favored Clinton, so all the model can do is increase the uncertainties and pull her win probability to 50%; it can’t get it below that point. And the probability of 70% seemed reasonable. But I think that was our mistake, to calibrate our sense of the win probabilities on a close election, without thinking about how it would look if the election wasn’t so close.

Again, we had every expectation that Biden would stay near 54% in the polls, as that’s where our fundamentals-based forecast was anyway, and we did say that if the polls stayed there, that Biden’s win probabilities would increase as the model’s predictive uncertainty narrowed with the approaching election. But we hadn’t fully thought through the math, or thought through if we’d stand by predictions such as 96% or 99+% at the end of October.

Sure, we weren’t anticipating the scale of what we’re hearing about efforts at vote suppression . . . but, again, we were anticipating the possibility of some unlikely events.

If we want to go with our 96% and 99+% forecasts, that’s fine. But, if we don’t want to stand by these probabilities now, then it’s on us that in July we didn’t think through the implications then.

This is related to the martingale property of a Bayesian forecast (see section 1.6 of our article): any forecast is not just a forecast of an outcome, it can also be viewed a probability-weighted average over all anticipated intermediate forecasts—which implies that we should think about those intermediate forecasts when building our model. Again, we kinda did that, but in retrospect I think we relied too strongly on the three previous elections without fully thinking about what might happen this time.

As they say in poker, don’t evaluate the play, evaluate the strategy.

72 thoughts on “So, what’s with that claim that Biden has a 96% chance of winning? (some thoughts with Josh Miller)

  1. It seems there is another way of making predictions here that would allow the tail events to be modelled with greater reference to the data. Currently, these models predict percentages for the two sides, but another strategy would be to predict vote totals. I understand there is much less sensible data for this prediction, and that turnout models are often baked into pollsters predictions of %s. But the current set up means it is tricky to work out if 90/95/99% outcomes are ‘correct’. Predicting vote totals means your hits and misses are more easily calibrated to data.

    If this suggestion is foolish for a reason other than missing data, I’d be interested to know why. If it is difficult because of lack of data, that could be used to tell us something about the uncertainty we’re trying to characterise.

    • Andrew is always incentivizing people to use more fine-grained data (e.g., using the points scored as the outcome instead of winning/losing), so this ought to be something he would support.

    • Paul, David:

      I agree that the model could be expanded to predict turnout as well as vote share. This would just require some more work, and I’m not sure that much would be gained from a turnout prediction if we don’t have good data and a good model. Political professionals predict turnout, that’s for sure.

  2. Do you think enough has been done with the visualizations and presentation to communicate that this is a vote intent model but that it isn’t a model of counted ballots?

    • Anon:

      I dunno, this is really a 2020 thing. Before the current election, it was assumed that pretty much all the votes would be counted. There have been some notorious exceptions, but this is the first time that people are anticipating potentially large gaps between the votes that people cast and the votes that are officially counted.

      • I think the questioner was asking a different question. It was not about counting; but more in the nature of “Hidden Trump” or “Hidden Biden” voters polluting the data. i.e. for Social reasons, they may give an answer opposite of what they intend to do. There are news reports of Democrats who may vote Trump because they believe his policies benefit them or Republicans who may vote Biden because they believe Trump is a phony/incompetent/ugly Republican.

        • I mean voting irregularities due to post office issues, local election boards, state or supreme court rulings, open suppression, long lines, covid, etc.

          The fact that the model takes so little of this into account and doesn’t communicate it to readers is malpractice, imo.

        • Anon:

          Right there on the forecast it says, “Our model is updated every day and combines state and national polls with economic indicators to predict a range of outcomes. The midpoint is the estimate of the electoral-college vote for each party on election day.” I think it’s pretty clear what information is used in the model. Nothing there about post office issues, local election boards, state or supreme court rulings, open suppression, long lines, covid, etc. No model is perfect. There’s plenty of places in the newspaper where you can read about these other issues.

          It would be fine to try to incorporate all those other factors into a forecast, and . . . you can do it! Our code, data, and forecasts are available online, so you can do some adjustments and make your own forecasts accounting for the possibility of cheating or whatever.

        • Thank you; I misunderstood your word “intent”. Then, my question to Andrew is, Has he given any thought to potential poll pollution by deceptive responders? Many Conservatives are claiming that polls are useless this time due to “Hidden/Shy” Trump voters.

      • Can vote counting irregularity be treated as a new source of error for the likely voter model? It’s another factor that causes someone who is a likely voter to end up not being counted as an actual voter.

    • Andrew, as an admirer, I’m expecting a very serious retrospective from you. I had been tormenting everyone with the Politico article featuring the Trafalgar forecast. Why did no one else think up clever questions, and dismiss turnout as important? On the rationality of voting Trump supporters get trolling pleasure from the act of voting, yes? Not those out to defeat him.

      This isn’t a tail event and that has to be faced

      Whether it might have helped avoid the result is another question

  3. I’m not a statistician, but it seems like any model would be farther off at the tails, because at the tails is exactly where information that simply isn’t modeled would change the result. Intuitively that would be true for any model, but maybe me not being a statistician gives me less insight.

    • It’s certainly true that the tails are the hardest part. By definition tail events are rare, so you aren’t going to have good historical statistics; and to the extent that you do, as Andrew has pointed out it isn’t clear that some weird thing that happened when Martin Van Buren was elected is going to be relevant now even in a statistical sense.

  4. I’m a bit confused by this: “Then again, our backfitted predictions seem to be overconfident . . . so I don’t fully believe our numbers either.”

    If I look at your back fits you give p_win(Clinton)~67% and p_win(Obama(2012))~78%-compared to 538’s 71/92 for these years IIRC. 538’s predictions (I know their model changes each cycle, but still) is touted by it’s founders as being more uncertain than other models, hence the fact that they were clear that Trump had a decent chance. In what sense are your backfits “overconfident”?

    • David:

      When I wrote that our backfitted predictions seem to be overconfident, I was referring to state-level predictions: there were more state-level outcomes than expected that were outside our model’s 95% and 99% predictive intervals.

      • I see-thanks. Is there a sensible way to alter the state-level distributions (tails) such that your backfits are not overconfident, and then see what the current model predicts? Does it make much of a difference?

        • Yeah, of the things you mention, training the tails on old state data is the only one that seems both like an obvious thing to do and not like a special kluge. I think that the strongest part of Nate’s strategy is his training on old data. of course it looks like he’s also got some weird state tails, with too much independence.

  5. This whole post confuses me a bit. I think the attempt to portray the results as the probability of X winning, or the probability of the vote intent, is somewhat misleading. What your models do is fairly sophisticated attempts to model what we would expect based on past elections (and what we are able to measure). To be sure, there are a few novel uncertainties in the current election. I don’t think it is helpful to try to calibrate your predictions to include these, as it threatens to undermine the integrity of what you have done by “polluting” it with quite subjective judgements. I’m not against making subjective judgements, but I’d try to keep them separate from the more soundly based statistical estimations. Correlations between states, non-response bias, etc. can be quantified, and I believe your models do a better job that all the others I have seen. Do I believe that Biden has a 96% chance of winning? No, but I don’t think it makes sense to calibrate your model to produce a lower estimate. That would require moving away from the fundamentals you have so carefully tried to model and include novel features that we have no statistical basis for. What is the probability that mail in ballots will be contested and the matter will be resolved in the courts? Then, what is the probability that the courts will rule in a particular way? The election may well hinge on such things, but I don’t think we have a basis for estimating that probability, nor do we even have a basis for estimating the probability that various state courts will rule in similar ways.

    I’d suggest separating these non-statistically based “probabilities” from the ones that derive from your model. I’ve made plenty of subjective predictions (not about elections, but about a variety of things through consulting work), but expert opinion (usually I use plural, if possible) is clearly different that building a statistical model. For example, I have a model predicting the effect of a tariff war on soybean prices. The statistical part is a time series model of soybean prices. I overlay that forecast with more subjectively modeled impacts of a trade war (which is not reflected in the past data) on supply and demand, and market elasticities. The two combine to form a prediction, but the two parts of clearly separated.

    I can’t understand why you are trying to combine the two to get a single probability distribution for the election results. It threatens to both undermine the credibility of your model, as well as overestimating the credibility of the subjective factors.

    • Dale:

      Our forecast, which is based on fundamentals, past elections, and polls, is a forecast of vote intentions; that is, it’s a forecast of what would happen if all the votes were counted correctly. 2020 has this new feature that people are threatening to not count all the votes. It’s legitimate to forecast both these things. I agree that it makes sense to separate these issues, and we do so. If someone were to combine the two, then I recommend that they clearly explain where the different assumptions are entering into the prediction.

      • It would be neat to have a supplementary tool where you throw in some unmodeled parameters so that readers can choose their own adventure by adjusting them. These can relate to throwing out votes, structural choices for the tails, etc.

        • Josh:

          We have all the R and Stan code so people can do what they want!

          But, yeah, we could put the code on a Google Colab notebook with some instructions so that it would be easy for users to tweak the Stan code or the covariance matrix and see what they get.

  6. Maybe this is essentially what Dale wrote above.

    Maybe what I’m saying is totally onbious…

    But it seems to me there are two attempts in play here that overlap. One attempt is to evaluate the election outcomes if the polls are accurate. That is what you’re modeling The second is to evaluate the chances that the polls are accurate. You can’t really model that.

    At some level, this current election is unique in terms of the accuracy of the polls. Maybe fewer people are answering their phones. Maybe demz are relatively less likely to turn out than pubz because of the pandemic. Maybe more pubz will turn out because they think that wearing a mask is an infringement on freedom. Trump inspires a cult mentality like no other American presidential candidate. Maybe Trump is more focused on black and Hispanic turnout than previous pubz. Maybe social media will have an unprecedented effect. Maybe Fox News will have a stronger effect than any rightwing media ever has before. Maybe the trauma of 4 years of Trump will motivate people to vote against a candidate more than has ever happened before. History isn’t informative.

    I think saying you’re measuring “intent” isn’t right. You’re measuring what those you spoke to told you their intent was, and then relying on pollsters to make adjustments to make that sampling representative based on predictors ofimited value. Your modeling of the polling could be absolutely perfect and still be way off on the outcomes.

      • Yeah…

        Pre-COVID, I’d already decided to vote against Trump (and in fact specifically voted in the D primary because I thought Biden had by far the best chance vs. Trump of any of the plausible candidates), but up to that point I wasn’t *really* concerned — thinking that in fact Trump hadn’t done many of the things that were feared (abolishing EPA or the ACA, starting wars with Iran, etc.)

        Things seem much more crazy/high-stakes now, with e.g. comments about stopping vote counting at midnight on Nov 3.

        • Funny, Trump has been worse than _I_ feared. I thought Congress would stop him from doing some of the worst stuff, but in fact they’ve let him do most of it. The stakes seem a lot higher to me now than they did four years ago.

        • Well, see, my primary concern in 2016 was war (I was very skeptical that Clinton would do anything actually effective about climate change, in terms of accelerating transition to renewable energy meaningfully faster than market forces are going to do anyway).

          The stakes do seem higher now, post-March… but then COVID has amplified a lot of stresses. And also shown that the system is less “immovable” than I had thought – I really did not expect states like TX to do stay-at-home orders, even briefly.

    • In general, I agree, but I think at least ‘low Democratic turnout because of the pandemic’ can be ruled out already by early voting data, can’t it?

      At this point I think Trump’s only chance is *really* unprecedented Republican turnout (ie, right-leaning people who historically haven’t voted) in key swing states.

      OK, voter suppression and such, but I think that without a significant polling error in Trump’s favor/really unexpected R turnout it won’t be close enough for that to make a difference. If it were within a point, sure, but if Biden is really +5-6 in PA and +3 or more in AZ…

    • Joshua:

      Not quite. Our model does not assume the polls are correct. It allows for systematic polling errors at the state and national level. Yes, the model is that the systematic polling errors are on the same scale as systematic polling errors in the past, but I think that’s reasonable. It’s not like past polls were perfect either.

  7. I looked up the Turkish bank thing that was mentioned:

    > This account is based on interviews with more than two dozen current and former Turkish and U.S. government officials, lobbyists and lawyers with direct knowledge of the interactions. Representatives for the Turkish government, Halkbank and the White House declined to comment.
    https://www.nytimes.com/2020/10/29/us/politics/trump-erdogan-halkbank.html

    People really have to start demanding higher quality sources of information than rumors from anonymous professional liars. It’s funny they mention there were over two dozen of them, 24*0 = 0.

    And when I read the news, almost all I see is rumors being spread by anonymous “officials” (politicians and lawyers).

  8. It is interesting that the Zeynep Tufekci uses weather as the sort of gold standard in forecasting, when I think most people do not actually understand precipitation forecasts. The probability of precipitation is the product of the certainty that precipitation will form and move into area and areal coverage of precipitation. A forecast of 40% rain could be either 80% certainty of 50% coverage or 40% certainty of 100% coverage. I am not sure if there is any equivalent in election forecasts except that a top line summary is usually an oversimplification that may not have much utility depending on users needs.

  9. I’ve no idea if there’s any way to express this concept in a mathematically rigorous way, but I wonder if it might make sense for a model like this to literally encode the concept of “there’s a factor that our model completely failed to account for”. It would probably have to be a pure judgment call as far as what probability to assign, but it would be based on the same judgment that you’re using when you conclude that 99% certainty is too high.

    So you could pick a threshold – say, decide that any forecast giving a >90% probability for the general election is overconfident – and then literally encode that in the form of allowing a 10% chance of “weird s*** happens that we couldn’t foresee”…

    • Stuart:

      Yes, you could do that. Just to be clear: 10% chance of something weird happening would not transfer to adding 10% to Trump’s chance of winning, as the “something else” could favor Biden, or favor Trump but not enough to give him the victory, or all sorts of things. And sometimes we have enough information that we really can talk about being 99% or even 99.9% sure (for example if offered a bet about the Jets winning the Super Bowl this year).

      Part of this is that we recognize that models are conditional on the data and assumptions that go into them. What made the forecasters look bad in 2016 was not just that the predictions were so wrong (Trump won the primaries after being assigned a 2% chance of getting the nomination, then he won the election after being assigned a 30% chance of winning) but that they were wrong within the parameters of the model. 2016 was a weird election but there should not have been anything model-breaking about it. Novelty candidates had run in the primaries before. State polls had been wrong before. Campaigns had been rocked by last-minute news before.

      Being able to make a reasonable forecast within the parameters of the model is a challenging enough task that I think it’s worth trying to do it right.

  10. First: My favorite thing about these election model posts has been your openness in grappling with all these different issues and how they impact your thinking. I think that is very important and also incredibly useful as an educational exercise for forecasting in general. So thank you for putting all this effort in and making it available publically.

    Second: Piggybacking on the Zeynep Tufekci writing, I think these issues are not “model” issues but society issues. Because of this, trying to incorporate them in a model may honestly do more harm than good (even if in the long run the model is more “accurate” in an abstract sense). The question of whether you should incorporate the percentage chance that a government will interfere in a democratic election by suppressing voter turnout or just throwing legal votes away without counting them is emphatically not what the role of these models should be. In fact, it seems plausible that doing this gives cover to those parties acting in bad faith (“see, we purged the voter roles and our chance of winning went up by 10%!”). I think you are better off transparently trying to quantify “voter intention” as you’ve sometimes discussed, as this is both closer to what you are actually measuring and also serves a more beneficial role in society writ large.

    If there is a realistic possibility that something crazy is going to happen, particularly if it’s something driven by human actors and not random like a hurricane or a candidate dying, then that should be discussed in contrast to the predictions of models (and fought out in the press, courts, and public sphere) and not incorporated into the models.

    This is one place where predictions about politics may differ from prediction in general. But I do think it’s always worth considering what the actual point of building a model is, beyond trying to make sure one number is close to some other number in the future.

    • To put it in a positive form, you are arguing that published models should predict the winner of a free and fair election. I could not agree more. The fact that we cannot be sure that an election in the US will be free and fair, should be stated emphatically and often, if the modeler suspects it to be the case.

  11. “But what about allowing the model to be adjusted for news that won’t be reflected in the fundamentals or the polls, for example the possibility of votes not being fully counted? I’m kinda thinking a forecast should allow for this, but I’m not sure how to do it in practice.”

    Couldn’t an analysis like Heidemanns’ have been incorporated into your model? The ballot rejection issue is not a new one. Trump has been saying for months that he believes mail-in voting is fraud. You may not be able to model the magnitude of the now almost certain legal challenges Trump will roll-out on Wednesday, but Heidemanns shows that some level of vote rejection is quantifiable.

        • I don’t know about “should”. I am pretty skeptical that the chance of the election being stolen (successfully) is equal or greater than the chance of a legitimate Trump win. The things I see mentioned are either almost certain to be thrown out by even strongly-Republican courts or too small to affect the outcome unless this is a 2000-close election (seems very unlikely).

          And *incorrect* attribution of a Trump win to fraud could also be very damaging.

  12. The normal distribution works fine for a close election. The problem is correlation among states. Although historical data will not support all the moving parts in your model (including 52*51/2 covariances), most of them are unimportant.

    You could simulate your model, and run a logit of outcome on the national vote share and shares in swing states. The gives one critical linear combination that you need to forecast, both mean and variance. Perhaps you could then estimate the variance of this number using precinct-level voting history or polls. One or two principal components of Pennsylvania, Florida, Wisconsin, Michigan, Arizona, and North Carolina should explain this election. Just double-weight Florida, and forecast the candidate that wins the majority.

    Blackjack players tried complicated nonlinear models before converging on linear high-low systems. Macroeconomic forecasters abandoned overparametrized structural models in favor of vector-autoregressions. Ray Fair’s election model is simplistic, but transparent.

    The opacity of complex models reminds me of a mobster with a Ph.D.: “I’m gonna make you an offer you can’t understand!”

    • KL:

      We’ve looked at all the things you suggest. The trouble is that the number of past national elections is small, so there’s not much direct data on the possibility of extreme events.

      You can feel free to do something like a principal components analysis or double-weight Florida or whatever. To me, it’s more transparent to model all 50 states using the data we have available. Rather than double-weighting Florida, we estimate the vector of 50 vote shares, and any weighting of Florida or whatever is implicit in our inference for the electoral college winner, which is a nonlinear function of the vote shares.

      I have a similar feeling about poll aggregation methods that do various forms of weighting and windowing to account for data quality and time trends. I find it clearer and a better way forward to consider the polls as data that provide some information about underlying public opinion. When we perform inference for the opinion time series, any necessary weighting and windowing is doing implicitly.

      This is the whole “Bayesian data analysis” approach to statistics: instead of trying to come up with a data manipulation or estimator that gets us to where we want to be, we set up a probability model, a joint distribution for data and underlying parameters, and then we perform inference. Then we have to check our inferences etc.

      You can feel free to prefer a non-Bayesian, non-modeling approach to prediction—many people do! That’s just different from what you’re gonna get from us. It could well be that the non-modeling approach works better for blackjack, while the Bayesian approach works better for elections and sports.

      • > a probability model, a joint distribution for data and underlying parameters, and then we perform inference.
        Starting the think many do not clearly see how such probability modeling implies what the weighted combinations of sampling units must be to best approximate (sometimes with no error) the inferences from the modelling.

        For instance, from the referee’s report back in 2016 for http://www.stat.columbia.edu/~gelman/research/unpublished/Amalgamating6.pdf
        “Further it would be helpful to know (without referring to O’Rourke) [so they don’t?] a little about how these best weights are implied by the error distribution (if restricting to a linear combination, the weights would have to be equal by symmetry [which is incorrect]).”

        We did not seem to think we had to explicate that in any detail…

        • Thanks again, Andrew and Keith. My comments are mostly about diagnostics for complex models. Andrew’s critiques of Nate Silver’s model were similarly based on diagnostics. Basically, Nate misspecified state correlations, and compensated by using fat tails. While I agree with your criticism, it hardly matters whether California’s chance of voting for Trump is 1/1,000 or 1/1,000,000. The correlations among Pennsylvania-Wisconsin-Michigan and Florida are much more important.

          In Markowitz Portfolio Theory, if the mean is mu and the covariance is Sigma, then the mean-variance optimal portfolio is proportional to Inverse(Sigma)*mu. You can’t estimate this classically for thousands of stocks, so quant models use parsimonious reduced rank models. It would be interesting to compare models based on the 51-dimensional mu and sigma for 51 state vote shares. Two model could be compared by adjusting their means and covariances.

          For example, one could compare models based on the mean and variance of vote shares for Pennsylvania + Wisconsin + Michigan + 2*Florida. This would distinguish between different modelling of the means, and different modelling of the variances and correlations. The correlations are essential to Trump winning with a Red Wave, or just getting lucky in one or two swing states.

    • Hn,

      I use the Zombies tag for topics that just keep coming back, that seem unkillable. We’d hoped to have resolved all our issues with the election model months ago, yet here it comes up again!

      P.S. I hope that all or at least most of our Zombies posts are informative!

  13. Hi Andrew, writing to you as LSE MSC Econ grad in the States.

    Perhaps your model is not updating the current polling accurately, as Trump has had a good few weeks. In the current Real Clear Politics polling for Pennsylvania, 3 of the last 9 polls have Trump leading there, and if he wins PA he probably wins the election as his positive results there also signal he will win the other key states that are easier.

    These 3 outfits are probably using different methods to identify voters as they have been consistently apart from the other pollsters, so it’s not just a sampling issue which can be corrected by aggregating all the models into 1 huge sample.

    So the betting line of Trump at 35% looks about right.

    • Michael:

      It’s not a sampling issue. The issue is that a model such as ours or Fivethirtyeight’s is an attempt to put together all this information more systematically. You can always pick things out of the data but that won’t tell the whole story.

    • lol. Michael, why did you click send on this message? Obviously Gelman’s model is going to be better at aggregating polls than this back-of-the-envelope-already-knew-the-answer “calculation” you just made.

  14. > It’s not that our model said 99% and we didn’t believe it so we fiddled with the model until we got a lower probability. It’s that the implausible (to us) probabilistic forecast gave us the impetus to inspect our model. If we hadn’t found any problems with the model, I think we would’ve kept it as is.

    While I believe that you earnestly believe this, I think the reality is more nuanced. You will always find some problems with the model if you look closely and carefully enough. But as long as the model is providing plausible numbers, you simply don’t feel the need to examine it as rigorously. And of course, “problems” with the model aren’t binary and more of a sliding scale with plenty of room for people to disagree on whether something is a judgment call or a problem.

  15. The Economist Model and the fivethirtyeight.com model are perhaps excessively complicated and full of “inside baseball” assumptions. I prefer this one of May 8, 2020, which is much denigrated by the statistics/psephologist cognoscenti:

    https://www.nytimes.com/2020/08/05/opinion/2020-election-prediction-allan-lichtman.html

    Allan Lichtman, if I read him correctly, predicted a Biden win in the popular vote of about 7.7% [7/13-6/13 = 1/13] which more or less, matches those the very sophisticated and involved prediction methodologies[7.9% for fivethirtyeight.com, and 8.4% for the Economist Model].
    Unfortunately, none of the models explicitly consider voter intimidation by private militia groups and Trump-friendly judges/justices.

  16. Andrew,

    First, your openness and transparency in self criticism is an example that I would love to see others follow. Better to explain what aspects and assumptions of your model give you concerns than to “fiddle” to get an answer you like.

    In terms of interpreting the output of your model, I think that there are a few conclusions (looking, Bayesianly at how the assumptions lead to the results).

    A. On the eve of the election, Biden is ahead by more than the typical polling error, so has a very good chance of winning. This is, of course, obvious, but all other comments have to be made in light of the main direction.

    B. Because of the Electoral College, and the fact that several key ‘swing’ states have lower margins for Biden, the chance of loosing the EC is significantly greater. Some analysis that you did shows this to be roughly a 3-4 point effect; 538’s model is about the same. However the polls – both national and swing states — show that he has a comfortable margin even after this effect.

    C. Net the mix of poll vs. fundamentals didn’t matter this year

    D. The net of A, B and C is that it would take a much larger than usual polling error for Biden to loose. Hence the 96%.

    E. The big question is “How likely is a much larger than usual polling error?” You discuss a few possible answers:

    Economist Model’s answer : The likelihood a an big error should be based on the historical observations of polling errors” — reasonable, but gives us a nagging sense that there may be fat tails / unknown unknowns out there.

    538 answer: There are many unknowns this year, so we put in lots of proprietary factors to fatten the tails” — hence fatter tails and output that is qualitatively very similar, but with probability of popular vote and electoral vote wins shifted. Also reasonable, although you have pointed out many concerns in the details.

    My general reaction to any very confident prediction is to take it as a possibly reasonable distillation of the known data, but to discount the confidence to account for unknown factors.

    At this point, issues in how voting intention translates to actual votes and how actual votes translate to counted votes probably outweigh the uncertainty due to sampling error or polling methodology. And I have no idea how big, or in what direction, this effect may be.

    May we all know the answers soon.

    Most important, if you haven’t be sure to vote tomorrow!

  17. Aha- this made me think of a possible explanation for the weird 538 tail results. Say that Nate trained the interstate correlation matrix on old deviations from model predictions. That will be dominated by many small weakly correlated errors. The correlations will become much more important for big errors. In other words, although he uses realistic fat-tailed distributions (non-Gaussian higher moments)for single-state probabilities perhaps he coupled them with mere second-moment correlations. I haven’t seem his algorithm and don’t know if this is correct.

  18. First of all, I really appreciate the transparency and explanations behind your model. If only everyone held themselves to such standards!

    Second, I disagree with the result. Are you accepting bets on the outcome of the election? Say, 5 to 1 odds (e.g. my $10 to your $50; I get $50 if Trump wins the electoral college)? Even with the uncertainty in mapping vote intention to actual votes, that seems like a pretty good deal, if you stand by a 96% probability.

  19. Andrew:

    Thank you for continuing to discuss the issues of the Economist model’s overconfidence that I’ve been raising for a while on this blog. My key reaction is to this statement of yours:

    I think that with wider central intervals and wider tails we could lower that Biden win probability from 96% to 90% or maybe 80%.

    I agree! I think that correcting the Economist model’s overconfidence about 2008-2016 could meaningfully reduce Biden’s stated win probability, but not below 80%. That’s what my analyses suggested (see note at end) and, more importantly, it sounds like that’s what you found too. So that’s the headline for me: After correcting the Economist model’s miscalibration, Biden’s stated win probability would still be greater than 80% and possibly greater than 90%.

    To me, those numbers are the most important part of your post because the most important topic is Biden’s (and Trump’s) win probability.

    For that reason, I also greatly wish that the Economist website had disclaimers about how the prediction model is of voting intent only, instead of claiming to predict every plausible election outcome.

    More generally, thank you for the repeated, thorough discussions of the strengths and weaknesses of the Economist model. Few in academia have your admirable candor, and your personal accounts provide an indispensable look at how applied statistics actually happens at the highest levels. Your detailed commentaries are a wonderful resource.

    On the other hand, no amount of discussion, regardless of its candor and detail, can substitute for actually correcting problems. I believe the problems with the Economist model deserved correction because the discrepancies between its predictions and the actual 2008-2016 election results are both statistically evident and large enough to be practically important. Together, those two ingredients generally mean a model should be corrected.

    — More about model miscalibration and model overconfidence —

    Statistically, the Economist model assigns of p value below 1 in a million to the actual state results of the 2008-2016 presidential elections — so, according to the model, the 2008-2016 state results are so unlikely they should never have actually occurred. (The p-value math here is the same as in my previous blog comments.) Calibration plots emphasize this: the model’s overconfidence is not limited to one or two states in the extreme tails like 99.9% predictive intervals or 99% predictive intervals, but seems to extend all the way down to 80% intervals and beyond.

    Here are my calibration plots showing this. They are the same as I’ve posted before.

    https://i.postimg.cc/c4KTB1hB/tail-check-v4.png
    https://i.postimg.cc/prShdHn2/tail-calibration-full-range.png

    Further, attempts to improve calibration seemed to substantially change candidates’ win probabilities and were therefore practically important. This is supported by my analyses (see note at end) and, moreover, must have been true for McCain’s 2008 win probability if it is true (as you say) for Trump’s 2020 win probability.

    My criticism here is not really about the miscalibration itself. Mistakes happen and, hey, in my checks of your model I made mistakes too, twice thinking I found bugs in your code when they were actually my own mistakes!! However, I had thought you would correct the miscalibration, and not having done so I feel like the Economist model is an impressive boat I’m walking past and admiring, until I get around the corner and see a 2-foot hole at the water line.

    — Lessons learned —

    Your blog post focuses on “lessons learned,” so maybe I can add two that could be useful:

    First, I think the shortest path to spotting miscalibration would’ve been to estimate the p-value your model assigned to 2008-2016 state results. That would’ve quickly suggested a problem, especially since the p-value for the set of state results remains extreme after dropping worst-fit states.

    Second, I suspect the state miscalibration is substantially due to overfitting. I’m speculating here — much as you’ve speculated about 538 — and I’m not sure of what happened. However, if you are inspecting the calibration and performance of your model against characteristics X and you overfit to those characteristics, then a side effect is often that you magnify any miscalibration and performance problems with characteristics Y that you have NOT been looking at. In this case, you were inspecting many characteristics, but not the calibration of your state-level prediction intervals, which ended up overconfident to what seems like an unnatural degree. For example, in the prediction interval calibration plot above, you can see that state results were outside of 75% prediction intervals much more often than they probably should be. I see how fitting a model with too-narrow tails can produce in overconfidence for 95% intervals, but overconfidence at 75% prediction intervals is unexpected and shouldn’t happen regardless of whether tails are too narrow or not – which suggests an unnatural side effect of overfitting.

    — Summary of analyses —

    Three of my analyses suggested heavy tails would greatly increase Trump’s 2020 win probability. All of them are imperfect but suggestive.

    * Your model’s state predictions are essentially normally distributed on the logit scale. Working with your 40 000 simulations of 2020 state results on the logit scale, I replaced each of the simulated vote share values with the equivalent percentile from a t distribution of the same location and scale parameters as the state’s normal distribution, but tails wide enough that they did a much better job of calibrating 2008-2016 (degrees of freedom = 3.5). This maintained between-state Spearman correlations, but not Pearson correlations. When Trump was given an 8% win probability on the Economist website, this adjusted analysis increased his win probability to 14%.

    * I ran your 2016 backtesting script with 2020 polling data and t distributions with 3.5 degrees of freedom in error terms. (I could not run your 2020 script itself because you do not share that publicly). That resulted in Trump having a 14% win probability at a time when he was given 5% on the Economist’s website. However, I do not know how much this difference is due to the heavy tails or different 2016 priors and other characteristics of the 2016 backtesting model.

    * Currently, Trump is given a 4% win probability on the Economist website. So, the percentage of state results that fall outside of 96% prediction intervals in 2008-2016 backtesting provide very rough intuition about the national win probability a calibrated model might give Trump. The percentage is 9%. Or, if you prefer to look at half the percentage of states outside 92% intervals, the percentage is 7%.

  20. I’m sure I’m not the first to note that a high profile poll or prediction can actually affect the likelihood of that event occurring. Fir example, if it looked like a landslide, I may not bother voting, and so it is less of a landslide. What are the ethics or considerations around making such a forecast?

  21. I realize that you are just rounding within the margin of error, but it looks innumerate to say: “around 1 in 20 or 4%”. 1 in 20 is 5%, and 4% is 1 in 25.

  22. Assigning states according to the already available results (I used the NYT projections) in the https://www.ricardofernholz.com/election/ scenario explorer based on the simulations from The Economist I get the following conditional probabilities:

    Scenario probability 12%

    Alaska R 99.7% – Arizona R 70% – Georgia R 70% – North Carolina R 67% — Nevada D 79% – Pennsylvania D 79% – Wisconsin D 94% – Michigan D 95%

    Biden win probability: 84%

Leave a Reply to Joshua Cancel reply

Your email address will not be published. Required fields are marked *