OK, this is all getting a bit recursive, but I think there are some more insights to be squeezed out of this particular lemon, so here we go . . .
Yesterday I wrote a long post on polling averages and political forecasts, focusing on the differences between Fivethirtyeight, whose model gives each party a 50% chance of winning, and the Economist, where the Republicans are at 75%.
In comments, Michael Weissman pointed to a blog post yesterday by Nate entitled, “Why I don’t buy 538’s new election model: It barely pays attention to the polls. And its results just don’t make a lot of sense.”
I’m glad to see Nate looking into another group’s forecast in detail in that way. I find this sort of outside perspective to be very helpful. Back in 2020 we did something similar looking at Nate’s Fivethirtyeight forecast—for example, one of our posts was called Reverse-engineering the problematic tail behavior of the Fivethirtyeight presidential election forecast, and it featured lots of graphs and code—and I was frustrated that Nate did not seem to look at or address our criticisms. I get it—he’s a busy guy, also if he were to agree with us about the merits of our comments, it wouldn’t be so clear how to fix his method, so maybe it made more sense for him to just dismiss what we wrote without at the time addressing the problems with his forecast. To be fair, the problems we identified were not major—they were artifacts in the tails of the probability distribution and did not directly affect win probabilities or other headline numbers—and it was during the campaign season when it would not have been easy for him to overhaul his model. Nate writes that Elliot Morris is “too busy to provide a longer explanation” now, which makes sense given that Nate was too busy back in 2020. I have time to write all these blog posts and comments, but that’s part of my job—as a Columbia professor, I have the time and inclination to compose endless explanations, but these guys are in the business/media world and are always hitting urgent deadlines.
In any case, I think it’s great that Nate is offering specific criticisms of the Fivethirtyeight model, and I hope the Fivethirtyeight team can make use of these criticisms and make improvements in their method. I’m also happy to see that Nate’s putting all this on a blog, which allows him to give all the details he wants. Much better than twitter! Nate writes, “statistical models like these are complex and can very easily go wrong. . . . it’s often hard to detect these design flaws through backtesting alone — usually you only learn the hard way once a model is stress-tested under real world conditions.” I agree completely.
Regarding the specifics of Nate’s comments: I see him making two main points.
Nate’s first point is a disagreement with Fivethirtyeight’s fundamentals-based prediction. He writes, “I also think their model gives Biden too much credit for being an incumbent in a polarized era where the incumbency advantage has considerably diminished. . . . My [Nate’s] fundamentals model has Biden favored in the popular vote by roughly 2.5 points, whereas Morris’s has him ahead by 3.3.”
This first point is not such a big deal; as Nate says, “this isn’t that large a difference”—a difference of 0.8 points in the lead corresponds to a difference of 0.4 percentage points in the two-party vote share—and its importance in the forecast reflects the general phenomenon that, when predicting a close election, small shifts in the expected vote translate to large shifts in win probabilities. When Clinton was running against Dole in 1996, for example, a shift in his predicted vote share by 0.4 percentage points wouldn’t have changed the status that he was a strong favorite to win reelection. Recent elections have been very close, though, and 0.4 percentage points can make a difference. There’s not too much more to say here except that, yeah, I guess the Fivethirtyeight team should look carefully at how they include incumbency in their predictive model. The current incumbent has low approval ratings, so it could well be that you’d want incumbency to be a negative predictor in this case.
Nate’s second, and larger point, is the relative weighting of the fundamentals-based prediction and the polls. He writes that the Fivethirtyeight model “treats the fundamentals as a strong prior and I [Nate] treat them as a weak one that you should be pretty eager to discard once you get enough polling. And I think one should be wary of strong priors in data-poor environments (only one election every four years) like election forecasting.”
From a Bayesian perspective, I don’t think there’s much daylight between any of us in general terms. It should always be the case that the importance of your priors diminish as more data arise, and, yes, when data are sparse, inferences will necessarily be more sensitive to priors and you’ll want to examine them carefully.
The disagreements come in the specifics, and they relate to what I wrote in yesterday’s post. To say that the Fivethirtyeight forecast overweights the fundamentals-based forecast and underweights the polls is to say that it’s expressing a model in which there can be large swings in the polls between now and November (that’s A!=B in the terminology of my post from yesterday) and that the polls can have a systematic bias (that’s B!=C).
Nate’s specific disagreement is with the Fivethirtyeight model’s implicit claim that there can be large swings in the polls between now and November. Nate writes, “their estimates of polling movement are derived from polls since 1948 — but polls now are much less ‘swingy’ than they once were . . . in a time of extremely high polarization, ‘drift’ is much less than it once was: the polls hone in toward their final margin earlier since few people’s votes are actually up for grabs.”
So, yeah, that’s the crux of it! The Fivethirtyeight model expresses a lot of uncertainty because it’s allowing for large, Dukakis-versus-Bush-in-1988-style swings in public opinion, whereas the Economist’s model and Nate’s model are making stronger predictions by assuming that swings during the campaign will stay in the narrow range that we’ve seen in recent national elections.
Nate frames his differences with the Fivethirtyeight forecast of 2024 as a disagreement of how much to trust the fundamentals, and that’s part of it. The other part is the model’s uncertainty about potential changes in public opinion.
As Nate points out, Fivethirtyeight’s wide uncertainty about national swings translates to wide uncertainty about state forecasts, for example, “in Pennsylvania, 538’s 95th percentile forecast covers outcomes ranging from roughly Biden +18 to Trump +17.”
One “sociologically” interesting thing to me here is that the criticism that Nate is making about the Fivethirtyeight model in 2024—its predictions are implausibly wide—line up pretty closely to criticism that were made of the Fivethirtyeight model in earlier years when he was running it.
For example, on 27 Aug 2020, I took a look at forecasts for Florida. The Economist’s 95% predictive interval ranged from roughly Biden +16 to Trump +6. Meanwhile Fivethirtyeight’s 95% predictive interval ranged from roughly Biden +18 to Trump +14. At the time, I felt that Nate’s interval was too wide, and it seemed to me that this wide range in the state predictions was there to allow enough uncertainty in the national forecast. Nate presumably believed our interval was too narrow. Fair enough; that’s why there’s room for different forecasters.
Anyway, look at this. In August 2020, Nate’s Fivethirtyeight forecast for Florida had a 95% interval of [Biden +18, Trump +14]—that’s a range of 32. In July 2024, Nate criticizes Fivethirtyeight for producing a 95% interval of [Biden +18, Trump +17]—that’s a range of 35. OK, 35 is bigger than 32, but not by much, and we are talking about a forecast that a month earlier in the cycle.
This is not intended to be a “gotcha” on Nate. It’s perfectly legitimate for him to say that a wide interval for Florida was appropriate in the unprecedented political environment of 2020, while he prefers a narrower interval in the gridlocked rerun election of 2024. As discussed above, these different interval widths correspond to different assumptions about the probability of large, 1988-vintage swings in public opinion. 2024 is different than 2020.
Here’s the thing I want to focus on. In 2020, the Fivethirtyeight forecast gave wider uncertainties for state and national vote predictions, leading to win probabilities that were closer to 50%, compared to other prominent forecasts such as the Economist’s. In 2024, the Fivethirtyeight forecast again gives wider uncertainties and a win probability that’s closer to 50%. So, in both campaigns, Fivethirtyeight is giving the cautious forecast—even though it has changed management!
This suggests to me the possibility of some sort of “institutional effect,” and that’s what I was getting at in my earlier post and comments when writing that Fivethirtyeight’s forecast in 2024 is “playing it safe, saying that anything can happen, which is in the tradition of past Fivethirtyeight forecasts.”
When moving from the Economist to Fivethirtyeight, Elliott Morris has moved to giving wider, Fivethirtyeight-style forecasts. From the other direction, after leaving Fivethirtyeight, Nate Silver has moved to giving narrower forecasts, more like what Elliott was producing for the Economist in 2000.
I’m not saying here that Elliott and Nate are explicitly setting up their forecasts with their institutional affiliations in mind. This is more of an offbeat hypothesis on my part that, just maybe, moving to Fivethirtyeight gave Elliott a feeling of increased responsibility that motivated him to think more carefully about uncertainties and where things could go wrong (in particular, the risk that without a large enough potential error term in the national swing, his forecast could too quickly go to one candidate or the other reaching a 99% win probability), while leaving Fivethirtyeight could’ve given Nate a feeling of freedom that could motivate him to think more carefully about sources of information beyond poll aggregation (for example, Biden’s age) which would make him more confident to restrict the range of his forecast.
To conjecture that they could be influenced by their institutional structure is not an insult to either Nate or Elliott; forecasting involves lots of choices, and we are all subject to incentives regarding accuracy, calibration, etc.
P.S. Also again see the P.S. from my previous post, where I briefly discuss prediction markets and the multiple roles of data-based forecasts, an issue that Nate touches on in the first paragraph of his post.
Andrew –
I’m curious whether methodologically speaking, there’s some broad inference re how the different analyses would be affected by Biden withdrawing and Generic Democrat entering the race? Of course, it seems to me that if Biden withdraws, even that would be a fairly useless conjecture – given that the specific replacement would have a big impact. But maybe not?
Joshua:
There’s lots that can be said about this topic but not much direct empirical evidence, so I don’t think it’s in the forecasts.
To put it another way, the models are based on predicting the Democrats’ and Republicans’ shares of the two-party vote. Candidate characteristics typically don’t come into the model, except for incumbency (which I think would count as a negative for Biden, given his low approval ratings) and indirectly through the polls.
So if Biden or Trump, or both, are replaced on the ticket, the models would keep chugging along with the new polls, and the sites would just change their graphics to match the new candidates.
That’s also why in my recent posts, I’ve mostly written about the forecasts for the Democrats’ and Republicans’ vote shares, rather than Biden and Trump.
I have a question about a hypothetical scenario. If one of the candidates were to leave the ticket and be replaced by another candidate, would you recalibrate the model to match the new candidate? I mean, throw out the old Biden vs. Trump polls and wait for new ones? Or would you say, along the lines of your argument about 538’s high uncertainty, that the model allowed for the possibility of a candidate switch? Or, third option, try both (old calibration and new calibration) to see if the model output changes in any meaningful way at all?
Raphael:
What to do with the forecast if one or both candidates were to be replaced on the ticket? I don’t know what the plan is with the Economist. My quick recommendations would be as follows:
1. Replace the fundamentals-based model as appropriate. For example, if Biden is replaced, the Democrats are no longer running an incumbent candidate, but they are still the incumbent party, so the prediction should account for that. And fix any home-state or home-region effects.
2. Keep the old polls but downweight them in some way, probably by adding an error term corresponding to an unknown candidate effect.
That’s all that comes to mind right now but maybe some other things would need to be done too.
@Andrew
Thank you, that is insightful. Did you want to write more? Because your comment ends mid-sentence.
Hi, no, that extra bit had been copied from elsewhere. I went in and removed it.
Computational considerations aside, I would also think (and maybe Andrew has thoughts here) that a new model could fit two separate aggregations in each state for each candidate (Biden/Harris). The ‘Biden fit’, were he to drop out, wouldn’t populate any results in an outlet’s forecast page, but could still be used to inform parameter estimates for pollster/mode-specific biases in the polls.
I struggle to understand narratives like “were more polarized than ever” or “incumbency is diminished,” which come with no empirical or statistical certainty that can be taken seriously when analyzing elections. There’s also a larger question about how effective Presidential Polls have been over time. Gallup has done a terrible job in close elections. I am very intrigued by the work of Alan Lichtman and Vladimir Borok, which starts with the supposition that elections are won based on the performance of the incumbents, not campaigning, and no VP choices, but results. Lichtman codified his prediction into thirteen rules, but today Iit might be much more effective to take his data set from past elections and build a neural network to predict the results.
Seth:
You write that narratives about polarization and incumbency have “no empirical or statistical certainty.” There’s no certainty and we should not expect certainty. There’s a lot of evidence, though! There’s a long literature of empirical research on political polarization. I’ve even contributed to this literature. It would be a big mistake to think that, just because we have no certainty, that all this empirical evidence is nothing.
Seth, start with the Lichtman supposition that elections are won based on incumbents’ performance. Also, that campaigning, VP choice, and probably legacy aka name recognition effects (e.g. being a Bush, Clinton, Kennedy, or even an Auchincloss in MA down ballot) don’t impact outcomes. In a rational world, this makes sense: Accomplishments! Meritocracy! Yet if it were correct, then the cottage industry of political consulting wouldn’t exist. PACs, campaign fundraising, and TV ads would mostly be a waste of time and money.
See Andrew’s reply to Raphael, noting that if Biden is replaced, Democrats will lose some candidate advantage but retain some incumbent party benefit.
Regarding polarization research, Andrew has a reputation for doing it well and without bias. I lean right, so I confirmed that (anecdotally not with rigor) with right-wing political science and statistics people in academia. Nate Silver says more polarization makes polls “less swingy” now versus 1948-1988. Seems solid. The electorate as well as those with power had “more drift” then. Examples: RFK wanted to be chief legal adviser for Joseph McCarthy’s Senate Un-American Activities Committee in the 1950s but didn’t get it until Roy Cohn resigned. JFK gave the eulogy at McCarthy’s funeral. All (other than Red Scare McCarthy) were lifelong registered Democrats, including Roy Cohn. Bostonian and Standard Oil heir Hugh Auchincloss Sr was GOP but when he married Jacqueline Kennedy’s mother, he switched his massive political donations to the Democrats, because he didn’t want strife with his in-laws.
Now: Auchincloss’s great-nephew (Fauci’s deputy for 20 years at NIAID), loved meeting US presidents, yet in his official NIH bio, said he would do anything to AVOID meeting Donald Trump. After a few amateurish SQL queries, I found that many active (mostly 18-45 yr old) redditors are furious about the Trump assassination attempt because it was UNsuccessful.
Something to cheer up Professor Gelman, Daniel Lakeland, and Phil: Biden is on track to lose the Electoral College (“A month or so ago, the notion that President Trump would win reelection was dismissed as delusional. No longer.”) via The Boston Globe dated 1 SEPTEMBER 2020!
> Yet if it were correct, then the cottage industry of political consulting wouldn’t exist. PACs, campaign fundraising, and TV ads would mostly be a waste of time and money.
While I think I speak for many in wishing that political consulting would not exist, and that there is *some* advantage to political advertisements, it does seem like the current equilibrium is indeed a wasteful allocation of resources. Put it this way: while ads may be effective, they seem to be in most cases highly inefficient.
I think another main point Nate is making is that the unintuitive blending of fundamentals and polls in states (like Wisconsin), where the final forecast is higher for Biden than either fundamentals or polls alone, could suggest an error or flaw in the model. I also didn’t really follow Morris’s explanation for this. This might be a minor effect on the current win probabilities given the large error bars as discussed.
Michael:
Yes, it’s also possible that the 2024 Fivethirtyeight forecast has bugs in its implementation. My guess is that these sorts of unexpected patterns are artifacts of how the dependence between states is included in the forecast, in the same way that the 2020 Fivethirtyeight forecast had weird artifacts such as its conditional prediction that if Trump won New Jersey, his chance of winning Alaska would be 58%, which was less than his assessed chance of winning Alaska conditional on losing New Jersey. It’s hard to make a multivariate forecast of these 50 correlated outcomes, and the fixes that we put in to make sure the headline results make sense, can induce undesirable intermediate results.
I believe that was because the state correlation matrix used by Nate’s model allows for negatively correlated states, whereas the current FTE/Economist models use a GP over distances in some feature space to construct the correlation matrix, which restricts to positive values. I think the latter is more defensible, but come election day, shouldn’t make too much of a difference in terms of forecast output.
So the more I look into it, the more I think Elliot Morris’ explanation is correct and it’s not a bug, just a flawed model.
First, it’s of course questionable to start with a prior that Biden wins Wisconsin by 0.2, and when you add a bunch of poll data showing Trump ahead both nationally and in Wisconsin, to get a posterior that Biden is will win by 0.9.
The best explanation I can think of is that the fundamental model is given most of the weight for the mean prediction nationally, but the poll data is strongly influencing the state correlation posterior. So, while Wisconsin has been 3 points more red than the national popular vote the last two cycles, it’s been about even or even bluer than the national average in the polls this cycle.
So, the 538 model concludes that because of the fundamentals, Biden will outperform the current polls come election day, and that based on the current polls, Wisconsin will be relatively bluer compared to the last couple cycles.
As I said in a previous post, I think it’s probably bad for the model to weight the poll state correlations heavily compared to the mean. I’m not convinced that polls can’t detect the mean (1st order parameter), but can capture the more subtle state correlation matrix.
But like Andrew says, you ride your model until it breaks down, and how it breaks instructs you on where to make changes. This type of modeling is hard.
So to beat this to death, here’s how the math would work in WI. The fundamentals say Biden wins the national vote by 3.3 points, and trump does 3.2 points better in Wisconsin than the national vote split. So 3.3 -3.2 = 0.1 Biden in Wisconsin.
Polls say Trump wins nationally by 1.5 and in Wisconsin by 2.4 (so only 0.9 points better in Wisconsin than nationally)
The 538 model currently weights the fundamentals more for national vote split, but weights the polls more for relative performances, so it ends up thinking biden wins by 2.1 nationally (closer to fundamentals 3.3 than polls -1.5), and trump only does 1.2 points better in Wisconsin than nationally (closer to polls 0.9 than fundamentals 3.2). 2.1 – 1.2 = 0.9. Thus Trump is predicted to do worse than in either fundamentals or polls.
Then again, I don’t have access to the model, so maybe it’s all a bug.
Kj:
Just one thing. You can colloquially say that the Fivethirtyeight forecast “currently weights the fundamentals more for national vote split,” but the forecast doesn’t actually weight anything. It does not compute a weighted average. What it does is fit a model that includes latent parameters for various errors, including drifts in public opinion and systematic polling biases. The result after fitting the model can be approximated by some sort of weighted average, and that sort of approximation can be valuable in understanding the model. I just want to emphasize that the amount of this “weighting” is a consequence of the hyperparameters in the model; it’s not itself a knob that is directly set by the designer of the model.
The relationship between the (median) final forecasts and the fundamentals and polls forecasts is pretty similar for all the states. [DC is an outlier and will be excluded from the regression below.]
The best (in least-squares sense) way to recover the full forecast as a weighted average of the fundamentals and polls forecast plus a bias is to take approximatively 20% of the fundamentals forecast and 80% of the polls forecast and add 2.5 percentage points. (Alternatively, add 3.2 percentage points of bias to the polls forecast before averaging with the fundamentals forecast.)
[If DC is included the weights change from 21/79 to 26/74 and the bias is slightly reduced from 2.5 (3.2 if applied directly to the polls forecast) to 2.3 (3.1).]
https://imgur.com/a/BytkrZG
Carlos,
First of all, how did you get that data? Did you manually collect it? Great stuff and really informative!
As Andrew pointed out in his last comment to me, ultimately we’re talking about a multivariate multilevel model with a lot of inputs and parameters, with a random walk process, correlation matrices, etc. Even talking of weighting is a gross oversimplification. In reality, different states will be pulled different amounts based on how correlated they are with other states, how much polling has been done, etc. You also have weird places like DC that are operating on a different part of the logistic curve and could behave differently.
There is no simple regression weighting between the two. But, if I had to guess at a simple weighting that would fit the projections well, I wouldn’t use:
proj_state = lambda*fund_state + (1-lambda)*poll_state + alpha.
I would instead bet this model to fit better:
proj_state = lambda*(fund_state – fund_national) + (1-lambda)*(poll_state – poll_national) + alpha.
If you really want, you can split alpha into: lambda2*fund_national + (1-lambda2)*poll_national, but that’s just for interpretation purposes.
The data shown in the “What do the polls and fundamentals alone say?” section comes from here: https://projects.fivethirtyeight.com/2024-election-forecast/priors.json
The metrics available for each state and at the national level are “Adjusted polling average”, “Forecast polling average” (same median, wider interval), “Fundamentals” and “Full forecast”. Sometimes there is also a “Polling average” – I haven’t looked at it in detail. I have not made use of lower/upper values either but they are available.
> There is no simple regression weighting between the two. But, if I had to guess at a simple weighting that would fit the projections well, I wouldn’t use:
> proj_state = lambda*fund_state + (1-lambda)*poll_state + alpha.
> I would instead bet this model to fit better:
> proj_state = lambda*(fund_state – fund_national) + (1-lambda)*(poll_state – poll_national) + alpha.
It almost the same model. If we explicitely write the error terms for each state and at the national level
proj_state = lambda*fund_state + (1-lambda)*poll_state + alpha + error_state
proj_national = lambda*fund_national + (1-lambda)*poll_national + alpha + error_national
substracting the second from the first we get
proj_state – proj_national = lambda*(fund_state – fund_national) + (1-lambda)*(poll_state – poll_national) + error_state – error_national
and rearranging a bit
proj_state = lambda*(fund_state – fund_national) + (1-lambda)*(poll_state – poll_national) + (proj_national – error_national) + error_state
The residual in my regression for the national numbers was very low, the alpha in the one you propose is essentially the full forecast at the national level.
Indeed if I calculate the model you propose I get lambda 0.214 (the same that was obtained with the previous model) and alpha 2.07 (pretty close to the full forecast given at the national level of 2.1).
Carlos,
Thanks for the link. I was able to run the models in colab and look at the outputs. And you’re right that the models are mathematically equivalent.
But, ultimately this is an exercise of function fitting, not mechanistic models. So it’s important not to portray our models as some causal explanation for how the 538 model is working under the hood. Looking at your model, it would say that 538 combines the polls and fundamentals with a 79/21 split, and the biases it all 2.5 points in favor of Biden. That would be a questionable of 538.
My model (which also is just function fitting), instead frames it as 538 finding the national forecast with a 31/69 split favoring the fundamentals, and then to determine how each state is relative to the national level, it uses a different weighting scheme of 79/21 split favoring the polls.
Both models account for oddities like OH and WI, and neither are really what’s going on behind the scenes. But it does show that you can explain the oddities without someone putting their thumbs on the scales, which I think is important.
These models are complicated, and figuring out what they are doing and why is hard. Also, conveying this info to the public is hard. For example, a lot of people are confused by the fundamentals getting so much weight when the adjusted polls (with drift) seem much narrower than the fundamentals prior. But, if you read their methodology, they add an additional t-distribution error beyond what’s shown to account for systematic biases. That results in the fundamentals getting more weight.
When models are opaque, it’s easy to assume bugs or bad actors, but in this case, I think it’s just subtleties and a slightly flawed model.
> it’s important not to portray our models as some causal explanation for how the 538 model is working under the hood.
I agree. It’s just a way to summarize what it happens to do right now – not a description of how it’s done and it wouldn’t help much to predict what it may do in other circumstances.
> Looking at your model, it would say that 538 combines the polls and fundamentals with a 79/21 split, and the biases it all 2.5 points in favor of Biden. That would be a questionable of 538.
One may say that in this model it biases all 2.5 points in favor of Biden also because of the fundamentals. After all the fundamentals are the only thing that can move the (median) “forecast of the popular vote” to something different from the “forecast of polling average on Election Day”. At least if the “polling bias” component is indeed unbiased.
> My model (which also is just function fitting), instead frames it as 538 finding the national forecast with a 31/69 split favoring the fundamentals, and then to determine how each state is relative to the national level, it uses a different weighting scheme of 79/21 split favoring the polls.
Yes, it comes down to the same thing. There is a “block” shift that “biases it all in favor of Biden” and then there is little influence of fundamentals at the state level. I’m not sure how appropriate is it to frame it as 538 finding the national forecast though, because in Morris’s tweet cited elsewhere he writes that “the national forecast comes from aggregating the state estimate.”
Maybe there is a very strong correlation imposed in the “polling bias” distribution that causes the correction to be almost the same in all states – falling short of the fundamental forecast in some cases but going beyond it in other cases. They could give details about how the systematic biases look in the simulation to make the bridge from “polling forecast” and “fundamental forecast” to the “full forecast” clearer – but theirs is the most transparent prediction already.
I haven’t yet found the time to study the 2024 models. I have this question: do any of these models take into account that both candidates are “incumbents”? One part of the “incumbent” advantage presumably comes from voters knowing what to expect; in this case, voters know what to expect from either.
Since you’ve been involved in some of the professional model building the past, do you have a sense of why the models are almost always presented in terms of a single forecast rather than multiple different counterfactuals? It seems like 538 would be educating the public more if they broke out forecasts into different scenarios: What if the election was held today? What if polls swing as much as they have over the past century? What if they swing as much as they have over the past decade? What if incumbency matters as much as it did over the past century? What if incumbency doesn’t matter at all any more? What if polls are correlated more/less than in the past? Etc. I guess I find the convention of wrapping all of these scenarios into a single forecast but then having arguments about the weights that each scenario was assigned to be odd. It’s not like we’re diagnosing someone with cancer where we need a clear “yes” or “no” … isn’t the value in this modeling precisely in revealing the many possible outcomes?
Regarding 538 vs other forecasts: Nate Silver, The Economist and ABC may all have different but equally legit objectives for their election forecasts which could shade some features of their forecasts. Nate writes and sells books about probability and forecasting. Presumably he’s interested in previewing his expertise on such. ABC and The Economist are both news orgs but they have different audiences: ABC is primarily a broadcaster appealing to a mostly American audience across a broad range of cultural backgrounds and education. The Economist is a weekly print mag who’s audience is the social and intellectual upper crust throughout the English-speaking world. ABC has legit incentives to keep it’s forecasts on the higher end of the legitimate uncertainty range to maintain a broad audience as there are numerous other news providers in its nich. OTOH, the audience for The Economist is more educated, more aware of the nuances of forecasting, and has fewer places to turn for the high quality information that The Economist offers.
I wouldn’t be at all surprised if ABC news execs had leaned on Nate Silver in 2020 not to publically discuss the internal dynamics of the model, believing, probably correctly, that such a discussion would reduce the credibility of the forecast in the eyes of ABCs audience; and to keep the error bars wide enough not to offend supporters of the trailing candidate, at least insofar as that can be justified by the polls and public mileu.
That just seems like the way the various people’s and organizastions’ interests pan out.
What do you make of the observation Nate made (and others have made) that Biden’s chances have improved in the 538 model as Biden’s polls have gone down? That seems intuitively wrong. Nate and you both discuss elements that might contribute to that – I can’t say I fully understand all of this. But it’s something that the less data savvy (like me) are confused by.
I read a reasonable explanation for this phenomenon in another comment. It said that it is not only important *if* the vote share changes, but also *where* it changes. If a ‘blue state’ is less blue this year, but there are slight improvements for Biden in a swing state, this could well explain his improved chances. That said, this is definitely a question worth pursuing by the model builders. The model consumers (I count myself among them) will certainly be interested to hear the explanation!
Michael:
I’m not sure. It could be some conflicting information in state and national polls, or it could be that the variation in the polling data caused the model to increase its estimate of the scale variation over time, thus changing the amount of partial pooling.
In general, I’d say that the model-based approach used by Fivethirtyeight and the Economist has some problems, and the weighting approach used by Nate has some problems. The problem with the model-based approach is that it has many moving parts, and the mapping from data to inferences isn’t always clear, and of course the model isn’t perfect so when the inferences aren’t clear, there are concerns that the model is doing something wrong. The weighting approach has the advantage of clarity and the disadvantage that you’re basically constructing the forecast by hand, and it’s hard to get this to work simultaneously for all 50 states (as indicated for example by the tail problems with the Fivethirtyeight forecast in 2000).
Either approach has enough “knobs” or researcher degrees of freedom that you can make sure that it gives reasonable results. But, as Nate correctly notes, once the campaign is on and data come in, either approach can drift off and give forecasts that don’t seem plausible. A lot of the work in constructing a forecasting procedure (as opposed to a one-time forecast) is in anticipating what might go wrong.
The setting of the knobs also implies that similar methods can give different forecasts. The 2024 Fivethirtyeight model is, I think, similar to the 2024 Economist model, but they give different forecasts. As discussed in the above post, I think that much of the difference between Fivethirtyeight’s forecast and others comes down to whether they’re allowing for 1988-style opinion swings. The funny thing is how Nate and Elliott have switched positions on this issue: in 2020, Nate was the one with the wide intervals and Elliott was the one assuming the polls would be stable, whereas in 2024 they’ve switched positions.
It’s a very different system, but the recent UK general election might be worth a comparison.
The Labour party, who won, had their vote share drop from 40% in the last-but-one election in 2017, where they won 262 seats, to 34% in 2024 and 411 seats (after falling to 32% and 217 in 2019). Compared to 2017, Labour lost votes in their safe seats, but gained votes in marginal and Conservative-leaning seats.
So maybe the models are somehow taking into account that lower national polling could lead to a more efficient distribution of votes in terms of the electoral college? That might not necessarily be true of course, but it doesn’t mean the scenario is totally wrong.
One could imagine losing very progressive voters over Gaza, but gaining voters who strongly support Israel and backing Ukraine. (But now that I write it, that seems slightly incongruous with how partisan polarization has shaped people’s views in recent years …)
Nonrenormalizable:
The U.K. is more complicated, with its multiparty system. That’s one reason why the use of MRP to predict constituency-level vote outcomes is such a bit deal in the U.K. In the U.S. it actually works pretty well to just predicting the national vote shares and then allocate the votes by state following the past election. In the U.K., that won’t work.
Thanks for responding Andrew. Just to briefly continue on this:
While I agree having significant third (and fourth, fifth and more) parties does change the dynamics quite a lot (the major-two party vote share is now around 50-60% from regularly being ~80-90%) isn’t the main point still true? I.e. it’s possible for the national polls to show one candidate slightly behind in the expected national vote share, but this may be correlated with an increase in that candidate’s probability of winning as their votes have become better geographically distributed? This is basically what happened with Trump in 2016 and Bush in 2000.
To frame it slightly differently:
1. Using Michael J’s description above, if a model predicts a candidate’s chances improving while their poll numbers have gone down, is this indicative of a popular vote-electoral college split? And if that’s the case,
2. If the model predicts that the *Democratic* candidate loses the popular vote but wins the electoral college, is that reason to be more suspicious of the model than if it predicted the Republican candidate to do the same?
-Worth noting that Nate Silver’s current model is basically his 2020 model from FiveThirtyEight, he kept the IP and is resisting the model with updated data. The only major change to my knowledge is that in 2020 FiveThirtyEight increased uncertainty because of the effect of COVID, this has been undone. (So the implication that Nate has changed his model is not really true, because it’s largely the same model that’s gradually evolved over 5 presidential election cycles.)
-Also worth noting that this article dose not really address that the Morris-538 model dose not seem to “add up” at the state level in terms of its final prediction and its polling and fundamentals only numbers and interval bands.
Glen:
What’s important is not just the structure of the method but how its parameters are being set.
Elliott’s method for Fivethirtyeight in 2024 is, I think, close in structure to his method for the Economist in 2020. And I’ll take your word that Nate’s method for his blog in 2024 is close in structure to his method for Fivethirtyeight in 2020. I’d also guess that both Nate and Elliott made various changes to their methods in order to fix imperfections in past forecasts.
Nonetheless, in 2020, Nate’s forecast was giving much wider intervals than Elliott’s (for example, that wide prediction for Florida), while in 2024, Elliott’s forecast was giving much wider intervals than Nate’s (for example, that wide prediction for Florida). I suspect the main reason for this is that, in 2020, Nate set his tuning parameters to allow for the possibility for large swings in public opinion during the campaign and large polling biases, two sources of uncertainty that made his forecast intervals wider and brought his win probability forecast closer to 50/50, while in 2024, Nate is allowing for smaller swings and biases, while Elliott has tuned his model to allow for more uncertainty.
Regarding state-level issues, yeah, I don’t know. I wrote something about this in a comment above. The short answer is that these forecasting models are complicated, and if you look at any of them carefully enough, you can often find artifacts. That’s one reason that, in my post above, I wrote:
I appreciate that Nate wrote that post, and I’m trying to contribute in a cooperative spirit with my post here. I hope that Nate, Elliott, and others can make use of our discussion when figuring how to do better going forward.
Since 2020 was an especially poor year for polling, both in terms of overall error and bias, wouldn’t a model that incorporates historical uncertainty be more uncertain than with 2020 included than without it? If so, then a model that gets less uncertain seems more problematic (at least how it models uncertainty) than one that gets more uncertain.
Hi Andrew,
I see many people state that the reason that 538 is relatively bullish on Biden is because of strong fundamentals for him and that the model expects polls to move toward Biden as a result. My issue with this, is that it seems to be directly contradicted by the output of 538’s model. If you visit any state forecast page at 538, you’ll see a summary of various metrics related to the model. Two of these are the Adjusted Polling Average (“The polling average adjusted for movement in similar states and the effects of party conventions…”) and the Forecast of polling average on Election Day (“An adjusted version of our polling average that accounts for potential movement in the race between now and Election Day…”). For each state, the point estimates for these two are the same, with the Election Day version having high uncertainty. To me, the plain reading here indicates that the model isn’t expecting the polls to move toward the fundamentals.
Similarly, Morris pointing to correlations between states as a reason for the strong forecast in Wisconsin is rather unsatisfying because Biden is already doing better in Wisconsin (relative to fundamentals) than most other states.
Now, perhaps there are good reasons for these apparent issues. Perhaps there are latent variables that aren’t well displayed, or the descriptions of the polling averages and fundamentals aren’t fully clear and maybe Wisconsin’s outcome is particularly strongly correlated with some data that is looking very good for Biden. The problem here is that it’s all just speculation, I don’t know if any of it is right. Consequently, I feel like I can’t understand how the model is going to react to new data, so I can’t really trust that it’s making good decisions. And I feel like I’m not actually learning much about the state of the race by following it, because I don’t understand which factors are most responsible for its predictions.
I’m a big fan of the saying, “all models are wrong, but some are useful”. And for me, that generally comes down to interpretability Essentially, do I understand why the model made this decision and in what circumstances it may be less accurate? I’m not getting that level of understanding with 538’s model, so I feel like it doesn’t even matter if it’s right, it’s just not very useful for me.
> I see many people state that the reason that 538 is relatively bullish on Biden is because of strong fundamentals for him and that the model expects polls to move toward Biden as a result.
I think it’s not “the model expects polls to move toward Biden” but “the model expects polls to systematically underestimate Biden”.
“Forecast of polling average on Election Day” : “does not account for the chance that the polls systematically underestimate one candidate”
“Full forecast” : “based on both polls and fundamentals and accounting for the chance that polls systematically underestimate one candidate.”
What’s not clear is how that “systematic underestimation” is “accounted for”. It could be a “candidate-agnostic” way to put together polls and fundamentals but it seems to favour Biden. It’s difficult to explain Wisconsin, Ohio, etc.
G. Elliot Morris has stated that “The model has the ability to guess how much polls should revert towards the fundamentals by Election Day…” (https://x.com/gelliottmorris/status/1812505261950587082).
I would expect that the Forecast of polling-average on Election day would be exactly what it says, although it doesn’t appear compatible with Morris’s statement.
I would also consider the potential for the polls to underestimate a candidate’s actual level of support among the population (i.e. systemic polling bias) a separate source of error from the potential for a change in voter preference between now and election day. So, I’d think that latter source would be included in a forecast of the election-day polling average. Perhaps not, but that’s the sort of detail that I’d expect to have been clarified by this point.
Jed:
It would be idea if the Fivethirtyeight forecast, the Economist forecast, Nate’s forecast, and everybody else’s were completely open source. I don’t think any of them are, so I’m not really sure that it makes sense to demand more transparency from Fivethirtyeight than from the rest of us.
I guess the issue is that if a forecast is entirely open-source, it loses much of its value because another organization can copy it, or copy it and make tiny alterations. I agree that more explanation is better.
Regarding your specific question, I’m pretty sure that the Fivethirtyeight model includes separate error terms for drift in public opinion during the campaign, and systematic bias of the polls. Both are important.
Hi Andrew – I wanted to respond to your comment in which you mentioned open source, but for whatever reason the blogging software doesn’t give me a Reply button on that comment, so I have to reply to the parent comment instead:
An idea I had: what if someone ran a competition for open source forecasting models for 2028? People submit their code at a certain date before the election, then the competition organiser runs the models, feeding standardised data updates into each one as it becomes available (e.g. poll results), and then once the election happens, evaluate how well each model performed? Each model would have a fixed allotment of CPU time/memory/storage per an update.
Actually, is there any formal methodology for evaluating the performance of election forecast models, once the election being forecasted has actually happened?
Simon:
An open-source forecasting competition would be fine, but I don’t think the scoring would tell us much, because it’s just one election, even if you look separately at the 50 states. A score could tell you that a bad forecast is bad (for example, in 2016, the Princeton forecast that gave Hillary Clinton a 99% chance of winning in the electoral college, or Scott Adams’s forecast that Trump would win by a landslide), but it wouldn’t do much to distinguish between different competent forecasts.
To put it another way, you could conduct a forecasting competition and design a scoring rule and declare a winner, but I don’t think that would tell us much about the quality of the forecasts.
Thanks for your reply Andrew.
One problem with comparing different closed source models is that they are moving targets–I assume they are never the exact same code and hyperparameters from cycle to cycle. With such an open source competition, one could re-run the previous cycle’s entries unchanged every four years, and hence build up a multi-year track record. Entrants could submit an updated model each cycle, but their prior entries would run in parallel with it. Maybe that would make it easier to distinguish between competent teams, at least in the long-run?
> G. Elliot Morris has stated that “The model has the ability to guess how much polls should revert towards the fundamentals by Election Day…”
That tweet is somewhat confusing. It doesn’t mention “polling bias” as it does in the methodology page. It talks about just two pieces:
> (a) a model of the polling data and (b) a model of historical fundamentals data.
I imagine that the “accounting for the chance that polls systematically underestimate one candidate” is part of (a).
> The model has the ability to guess how much polls should revert towards the fundamentals by Election Day. It does this by adding a random variable drawn from a multivariate distribution.
I understand that this random variable includes both the uncertainty about the evolution of the polls (which vanishes as we get closer to the election) and about the “polling bias” (which doesn’t).
> This amount of movement (by design) is not mean-zero across states or simulations. Polls can move up or towards fundamentals — by less when we’re close to the election and more when we’re far away.
I’m not sure what the first sentence above means. I would expect the random variable added to be “mean-zero across simulations”. A few days ago I was not sure if “accounting for polling bias” involved some kind of “anti-bias” adjustment but now I think it’s just a widening of the “polling” distribution that makes the posterior gravitate towards the “fundamental” distribution more strongly. Maybe that’s what he means by “not-mean zero by design”.
The “less” and “more” in the second sentence are compatible to election-day uncertainty being only about “polling bias”.
However, he also write later the following but surely the “polling bias” is still present:
> This adjusting will decrease as we get closer to Election Day (and is gone on E-Day) since there is less (no) uncertainty to simulate about how things will move by Nov 5
One will still have on the day of the election a “forecast of the popular vote, based on both polls and fundamentals and accounting for the chance that polls systematically underestimate one candidate” and it could still be between polls and fundamentals in some states but be more extreme in others.
@Andrew
Thanks for your thoughts. I would say that my criticism isn’t really focused on transparency by 538 (I find their methodology page well written and appreciate Morris’s responses to criticism). Rather, it’s that there appears to be a disconnect between the methodology and how their model responds to data, and Morris’s responses haven’t clarified much on those points. If I can’t trust that the model is acting as their methodology says it does, then it essentially becomes a black box without a track record.
@Carlos.
I interpret “polls systematically underestimate one candidate” to refer to a potential gap between polling numbers and the true opinion of the populace. I’d also think that the idea behind polls moving toward fundamentals as election day approaches is that the fundamentals are an indicator of how voters might change their minds (because they are more or less forgiving of an incumbent based on the economy, for example). So, a systematic bias may exist now and at election day, but can only be measured on election day because that’s when we can compare polls to actual results.
My mental model is:
* Fundamentals – based on economic and past political data
* State polling average – based on polls in one state
* Adjusted polling average – takes into account polls in other states and helps adjust for some random variability in the state polling average
* Forecast of election day polling average – Higher uncertainty than the adjusted polling average, because it has all the current uncertainty + uncertainty about how much polls will move by election day. Because Morris states that the fundamentals help predict movement in the polls by election day, I’d expect the point estimate here to be different than the adjusted polling average (but this does not match what was shown in the model). One could also imagine a model which didn’t use the fundamentals as a predictor of polling movement, in which case the wider uncertainty with an average movement of zero across simulations would make sense.
* Election Forecast – Takes the election day polling average and adds uncertainty for systemic bias in the polls. I believe the methodology page says that fundamentals still are in play here and therefore the potential systemic bias across simulations does not have to average zero (although I believe Nate Silver’s model does drop the influence of fundamentals to zero by election day).
The described methodology makes sense to me and I get the arguments for the methodological differences I mentioned. However, I’m unable to see how the model’s output matches the methodology. While Morris seems interested in explaining more, so far the explanations haven’t clarified much. See Nate Cohn (of the New York Times’s Upshot) comments here for the type of information which would quell my concerns: https://nitter.poast.org/Nate_Cohn/status/1813290590240702618 (scroll down, it’s across 4 tweets)
Andrew, what do you think of Ann Selzer’s polling? I heard an interesting discussion/interview with her on the New Yorker’s Radio Hour. She seems to be doing fairly sensible stuff, at first glance. One funny thing was that her company called her to get her to participate in her own poll (she refused to participate but didn’t explain why).
PS Then they interviewed some other guy who says, the average of the polls is what you need to know who’s ahead and who’s behind.
Shravan wrote:
“…some other guy who says, the average of the polls is what you need to know who’s ahead and who’s behind.”
Both Ann Selzer and this question have come up before on the blog. The “some other guy’s” claim is likely nothing more complex than that a bigger “n” makes for a better sample, but it isn’t really that simple.
If an election is expected to be decided by <2%, is it still OK to add in data from a pollster with an established bias that requires a correction factor of +/- 5% for one candidate? I don't think so. Instead of a better sample, what you get by adding in these dubious polls is variance loss, by swamping the polls that have high signal-to-noise with a lot of polls with very little signal. Add in enough garbage and the signal flatlines, producing a 50-50 split and a headline like "Polls show dead heat!"
So can Ann Selzer consistently produce a high-quality poll with a much smaller "n" that is more accurate than an average of many polls? I think that is well within the realm of possibility.
Shravan:
Hey, the New Yorker should’ve interviewed me! Regarding Seltzer, yes, my understanding was that her polling did very well in the past. I have not studied the matter enough to offer any opinion on how well they will do in the future. That doesn’t mean I don’t think her poll will do well; I just don’t feel qualified to offer an opinion on the question.
It would be interesting to be able to play with the models to see how they respond to various perturbations, e.g. shifts ±1 in PA polls. I get the feeling that Nate’s and the Economist’s generally respond in comprehensible ways to plausible small changes, although not necessarily so to big unexpected changes, as you point out. The 538 model may not respond comprehensibly. I guess it’s unlikely that any modeler will set up toy sites to allow that sort of test.
Meanwhile, it’s worth a repeated warning for the legions of enthusiasts of the deeply flawed Lichtman model. He dichotomizes fuzzy subjective quantities. His prediction is also unreasonably dichotomous. He ignores genuine uncertainty about extremely close races and then excuses a missed prediction on one. He simply ignores that with small N and a large number of parameters to choose from you inevitably overfit. Although the real models have no variable for one-off situations like “one candidate is more senile than had previously been known” or “one candidate has new felony convictions” they do have generic uncertainty to allow for rare effects and do allow them to show up via polls. Lichtman’s doesn’t.
The 538 model may have some serious problems but the Lichtman poll-free model is a parody.
Michael:
I think that all three models would move in reasonable ways in response to small changes in the data. I have no reason to think that the Fivethirtyeight model would not respond comprehensibly. When it comes to large changes, all bets are off. Recall my post from 2020 describing how the Fivethirtyeight model at the time gave implausible results when conditioning on events such as Trump winning New Jersey.
All these models have been set up to give reasonable results with the sorts of data that we’ve been seeing in recent years. They have not been tuned to give reasonable results with unexpected data. For example, if you take the Economist model or Nate’s model, and you suppose that future polling first moves the Republicans up to 55% of the two-party vote share and then down to 45%, I think those models will send the Republican win probability up to 99% and down to 1%, the sort of thing that is an indication that the models violate the martingale probability. The point is that those models are set up assuming that such big swings won’t happen—and I guess they probably won’t!
Regarding your last sentence above: again, I don’t see any good evidence that the Fivethirtyeight model has serious problems, except that Nate wrote that long post about it, where the key issue is the Fivethirtyeight model is allowing for bigger future opinion swings or polling biases than our models are allowing. I wouldn’t call this a serious problem, any more than I called it a serious problem with Nate’s model for Fivethirtyeight four years ago, which was doing pretty much the same thing, giving probabilities closer to 50/50 by allowing for the possibility of big future swings or polling biases.
The Lichtman thing, yeah, what can I say except that when I was younger there were bestselling books on ancient astronauts and the Bermuda triangle, in recent years we’ve seen adoring media attention on the Gospel of Jesus’s Wife, the work of Dan Ariely, Nudge, and UFOs. I guess it stands to reason that there’s room for some pseudo-social-science along with the pseudo-history, pseudo-cognitive-science, and all the other crap being pushed by NPR, Freakonomics, etc etc. In some ways, Lichtman annoys me more because he’s making pronouncements in my own field of research; in other ways, I find him kinda charming because he’s low-budget. He just wrote that one book and then he goes on TV or whatever; he’s not sucking up endless resources in the manner of Wansink etc.
Andrew- Yes, I followed your point that all these models can behave oddly under implausible big perturbations of input data. Nate is suggesting that 538 may be buggy or perhaps just weird, i.e. may behave oddly under even plausible input changes. That’s why it would be fun to be able to test if that’s right.
On Lichtman, it’s unnerving to see people I respect (e.g. an expert on airborne viral transmission) take him seriously.
p.s. If you don’t mind, I’d like to screenshot your Lichtman comments for response to some of his fans.
It’s all public! Repost what you’d like.
Regardless of what is going on with 538, ABC would be wise to give Morris as much time as possible to explain how his model is working, because it’s being mocked and attacked by popular media members.
Why didn’t Nate reply to you in previous years? Because your criticism didn’t get traction.
Is 538’s model strictly worse than previous 538 models? Maybe not, but it’s producing some bizarre results and it’s their main product.
Jb:
Regarding your last paragraph: I don’t think the results that the Fivethirtyeight model are producing in 2024 are any more bizarre than when in 2020 it produced a map allowing the possibility that Biden could win every state except New Jersey or when it displayed Trump winning California as in “the range of scenarios our model thinks is possible.” Those were pretty bizarre claims! I don’t think that made Fivethirtyeight’s 2020 forecast useless; it just meant that I wouldn’t have wanted to trust some of its more focused predictions.
The quick answer to why Nate hasn’t been replying to my emails is that he has better things to do. He’s done some work with poll aggregation and election forecasting, and I get the impression from reading his forthcoming book and other writings of his that in recent years he’s moved on, he’s more interested in other things, and he would prefer not to be thought of as the election poll guy. I interpret his election posts in 2024 as a product of annoyance, and he doesn’t really feel like pursuing the ideas further, for example by discussion with me or even by discussion with the current Fivethirtyeight team.
I wrote a post about this general point a few months ago. Back in 2012, when discussing arguments about forecasting, Nate wrote that blogs “lend themselves to an honest back-and-forth about the sausage of statistical conclusions, which can, hopefully, create a more respected class of experts and a more informed public.” In 2023, though, Nate wrote, “I don’t intend this a back-and-forth,” and in his recent post, he wrote that he wasn’t interested in “having big public debates about forecasting methodology. One reason is that I [Nate] find these arguments tiresome.” I take the difference as being that he sees poll aggregation as more of an obligation than as something fun.
I think that a big reason my criticism of Fivethirtyeight in 2020 did not get much attention is that I expressed in a moderate way, at least as compared to Nate’s criticism of Fivethirtyeight in 2024. I didn’t analogize Nate to a notorious conspiracy theorist, I didn’t say he was trying to pass off a bug as a feature, and I didn’t analogize the forecast to the Titanic, I didn’t characterize it as “just some dude’s opinion,” and I didn’t write, “intentionally or not, he’s designed his model in such a way as to be nearly impervious to contrary evidence.”
Instead I wrote about “some odd tail behavior in the Fivethirtyeight election forecast” and my conjecture that “these wacky marginal and conditional probabilities came from the Fivethirtyeight team adding independent wide-tailed errors to the state-level forecasts,” along with my speculations that “Fivethirtyeight’s correlation matrix seems to be full of artifacts. . . . Maybe there was a bug in the code, but more likely they just took a bunch of state-level variables and computed their correlation matrix, without thinking carefully about how this related to the goals of the forecast and without looking too carefully at what was going on.”
At the technical level, my criticisms of the Fivethirtyeight forecast in 2020 and Nate’s criticisms of the Fivethirtyeight forecast in 2024 are pretty similar: in both cases, we’re saying that we think the uncertainties are too wide, in both cases we’re pointing to behavior that seems wrong involving the overlay of state and national forecasts, in both cases we’re raising the possibilities of bugs in the code, conceptual errors in setting up the model, and fudge factors gone wrong. In both cases we’re criticizing the model from the outside (which I think is valuable!). In 2020, I expressed skepticism that the Fivethirtyeight forecast allowed the possibility that Biden could win every state except New Jersey, and in 2024 Nate has some issues with the Fivethirtyeight forecasts in Wisconsin.
Also in both cases the criticisms came from the outside. In 2020, the Fivethirtyeight forecasting procedure was not fully described, and so I had to reverse-engineer it, which I did by analyzing posted simulations of the fifty-state forecast. In 2024, the Fivethirtyeight forecasting procedure is not fully described, which led Nate to write, “some of the internal workings of the model are strange, or at least appear that way based on the information Morris has made publicly available.” After my experience in 2020, I get Nate’s frustration: it can be hard work to try to figure out what’s going on, just analyzing output, without having a full description of the method.
In 2020, Nate wrote, “We think it’s appropriate to make fairly conservative choices especially when it comes to the tails of your distributions”; in 2024, he writes that Fivethirtyeight “badly exaggerates the amount of polling movement we’re likely to see . . . but polls now are much less “swingy” than they once were . . . “drift” is much less than it once was: the polls hone in toward their final margin earlier since few people’s votes are actually up for grabs.” This is the crux: in 2020, Nate felt there was high uncertainty (Covid, remote voting, etc.) and he tuned the Fivethirtyeight model to express this (wide forecast intervals for Florida, displaying Trump winning California as in “the range of scenarios our model thinks is possible,” etc.); in 2024 he does not anticipate big national swings between now and the election, hence is disagreement with Fivethirtyeight’s wide forecast intervals and high uncertainty.
In both cases, I think the models do some weird things; my criticisms of Fivethirtyeight in 2020 came in part from differences in opinion regarding how the election might go (I thought his state-level uncertainty intervals were implausibly wide) and some issues I identified as artifacts (the thing about New Jersey, for example). I see Nate’s criticisms of Fivethirtyeight in 2024 as having a similar flavor. But he’s expressing them in stronger terms. It could be that if I’d expressed my criticisms in stronger terms in 2020, those criticisms would’ve got more traction—but I didn’t want them to get “traction,” if that meant that people had the impression that I thought the Fivethirtyeight model was terrible. I didn’t think it was terrible, I just thought it had some problems, partly arising from different choices about uncertainty and partly from complexities in the multivariate prediction.
Here’s an analogy. After Case and Deaton published their article on deaths of despair, I wrote some posts and published an article arguing that some of their results came from an error in their data analysis: they’d adjusted by decades of age but not performed age adjustment at a finer level, and as a result they had misleading patterns. I was annoyed, first that they’d made this basic error—analyzing mortality data without carefully adjusting for age—and second because, after I’d made these criticisms, they lashed out in a defensive way. This one was in the news, and I was contacted by a news reporter who was writing a story on the subject. He asked what I had to say. I said that I thought that Case and Deaton had made an error, and this led them to mischaracterize the trends in the data—but, still, it did not invalidate the main pattern they’d found, which was a comparison between trends in the U.S. and in other countries. The reporter said that he couldn’t do much with this mixed message: I got the impression that he’d have been happier if I’d just flat-out slammed Case and Deaton. That would’ve made a better story. For their part, Case and Deaton responded in a moderate way. I wasn’t thrilled with how they framed the criticism offered by me and others, but they didn’t say we were wrong, either. So there was no big news story of a “demographics war.”
I agree with you that, given the aggressiveness of Nate’s criticisms, it could make sense for Fivethirtyeight to reply in detail. Ideally the two groups could talk with each other—I wish Nate would read my posts too!—but I know from experience that once there is an intellectual disagreement, it can be hard to find common ground or even have a cordial discussion.
What was useful about 538’s 2020 forecast was that inspite of the huge uncertainty it presumed, it still gave Biden an 89% chance of winning. That was very comforting for Democrats when the betting markets only slightly favored Biden (~62%).
In 2020, I was more in line with the Economist model which had Biden around 95%
But the results were much closer than expected, notably with Wisconsin polls being off by ~8% (Biden +9 to Biden +1) and polls favoring Biden in Florida/North Carolina and having a chance in Iowa/Ohio
That election showed that conventional wisdom was pretty valuable since Saagar Enjeti correctly predicted Biden winning 306-232 with all 50 states correct and he didn’t over focus on polls.
Going forward, I think prediction markets to a good job at individual states (don’t have big misses like Florida being 75% Biden) and then the best model should be able to combine those 53 forecasts into 1 overall presidential forecast with less uncertainty than betting markets.
Currently betting markets say around 62% Trump which is a fair middle ground between the Silver/Economist/JHK (Trump > 70%) models that trust polls which have Trump winning every swing state too much and 538’s model which has too much uncertainty and doesn’t provide any value (I had this critique back when they released it in May for way too huge error bars considering it’s a rematch of 2020, so it should be very similar with say a Biden +2 popular vote and an independent standard deviation of say 3-5% for each state. They vastly overstated the probability Trump wins popular vote and loses EC since they had Texas/Florida plausibly going to Biden).
Andrew, thank you for these series of posts which reveal some of the behind-the-scenes of polling/forecasting this election cycle — and indeed for your posts from previous cycles.
I have one issue I’d like to raise, which is based on this parenthetical in your main post:
> (in particular, the risk that without a large enough potential error term in the national swing, his forecast could too quickly go to one candidate or the other reaching a 99% win probability)
I think you — or some of the regulars in the comments here — have addressed this sort of thing many times before, but: isn’t this a kind of “early-stopping”? One sets up the experiment/simulation, which is “pre-registered” to be verified/give a result at a certain date, but one secretly opens it up to make sure that it isn’t giving a definitive result too early in the cycle?
Or would you characterize it as a way of avoiding the Opera FTL issue (which they only made public as they struggled to find out the systematic error causing their anomalous measurement) — you’re just checking that there are no loose cables.
It’s hard to say that “a 99% chance” that a given candidate will win in November is wrong, provided there’s no error in the calculation — it’s just a function of your model. Although I would say that a model predicting a win for a third-party candidate or electoral college tie being >99%, if based on the same data as the models you’ve discussed, must have some radical inconsistencies in its innards.
He would have to be aware of it in order to dismiss it. Do we know of him ever acknowledging its existence?
Wonks:
Nate acknowledged some of our critiques but dismissed them in a way that suggested he hadn’t fully processed what we were saying, or maybe he chose not to process it. He wrote something about how we didn’t understand what he was doing but he didn’t offer any justification, for example, for the map showing Biden winning every state except New Jersey. And I’ve been in email contact with him from time to time, so I’m pretty sure (though not completely certain) that he saw the things I sent to him and chose not to respond. As discussed in the above post, I can kind of understand his motivation for not engaging with my criticism—but it’s not a choice that I would’ve made, had I been in his place. Informed criticism is hard to find, and when I get it, I treasure it. That’s one reason I took the job that I do.
Re-reading the “Reverse-engineering” post I see there’s a P.P.S section with Silver responding to the Economist.
Wonks:
Yes, he did. He wrote, “the Economist guys tend to bring up stuff that’s more debatable than wrong, and which I’m pretty sure is directionally the right approach in terms of our model’s takeaways, even if you can quibble with the implementation,” without explaining how it’s “more debatable than wrong” or “directionally right” to have a scenario in which Biden wins every state except New Jersey, etc.
I don’t think Nate owed us any response. If he doesn’t want to publicly learn from our critiques, that’s his call. His response just seemed dismissive and defensive. He could’ve held his ground while saying something more reasonable, like, “Hey, forecasting is hard, and, yeah, you found some weird stuff in the tails of our forecast. Ultimately we’re more concerned about getting a reasonable and non-overconfident forecast for swing states and the electoral college, and given our current methods the way we did this was to put in some error terms that gave us that weird tail behavior. I appreciate the Economist guys pointing this out, but right now I don’t have time to try to fix this, and I think that for practical uses our forecast is the best out there. It’s not like anyone’s going to use our model to answer theoretical questions like, ‘If Trump wins New Jersey, what’s the probability he wins Alaska.'”
If Nate had written the above, I’d understand. My models aren’t perfect either, and I’d respect a response that said that, from a statistical perspective, he’s picking his battles. We all do. But I was disappointed by his dismissal, followed by a lack of engagement with our criticisms. He gave up an opportunity to learn! What’s the point of that?? High-quality criticism doesn’t come every day.
I think one thing that you are underdiscussing is that the 538 model didn’t give each party a 50% chance of winning; it actually gave Biden a *greater* than 50% chance of winning. I think people just intuitively found this implausible given the polling.
I think that there would have been less criticism of the model if it had consistently given Biden a less than 50% chance, even if it was in the 45-49% range. There probably still would have been criticism that it was “useless” or too uncertain, but I don’t think as many people would have objected to it so strongly.
In other words: Trump was leading in the polls in all key states and even nationally. If you wanted to say that polling this far out was so unreliable that Trump was only a slight favorite (e.g. 51-55%), I think people could see that as defensible if a bit “useless”. But to instead say that you trust the fundamentals so much more that Trump actually becomes the underdog (even slightly) seemed implausible to a lot of people.
Just one more illustration: for April 8, the 538 model had Biden at a 62% chance of winning, whereas the Economist had Trump at a 53% chance of winning. So it’s not the case that the 538 was just consistently more “uncertain” (i.e. closer to 50%) than the other models; it actually had a “bias” in favor of Biden relative to the other models, which led to results that people found implausible in July given the state of polling.
Asher:
I don’t think it’s accurate to say that the Fivethirtyeight model had a “bias”; its fundamentals-based model predicted an advantage for the Democrats, and its polling model allowed for the possibility of large opinion swings. That’s not a bias; it’s just what popped out of the model. It’s fine to say that you and others found the model’s prediction to be implausible—indeed, it differed from the prediction from our Economist model—; once you have that impression, it makes sense to go back to see what aspects of the model you disagree with.
Andrew, your post is a bit too long and, more importantly, IMO, do not even address the only issue that is relevant in all your discussion of previous polling, etc. etc.
I do not have to research anything related to bygone polling to know the one simple fact that, in my experience of following elections beginning with the 1952 Eisenhower/Stevenson one, there has never been anything resembling what has just occurred with Biden dropping out with just over 100 days to the election.
The only other previous really unusual election which may have had wide polling swings due to what was happening with the candidates, was the 1992 Ross Perot – I’m running/no, I’m not/yes, I am – election in which the nutjob ended up getting nearly 19% of the vote.
You, and most of your responders, can talk theoretical/”scientific” silliness all you want, but as far as predicting what will happen in the next 14 weeks in the polls, you have no valid basis to for those predictions.
Timothy:
As the saying goes, the alternative to “good statistics” is not “no statistics”; it’s “bad statistics.” Like it or not, political professionals, media organizations, and consumers of news will be making election forecasts using available information from economic conditions, polls, previous elections, candidates’ track records, etc. The purpose of my above post is to try to understand the mapping between assumptions and conclusions in the forecasts that are out there. Again, if you were able to turn off all the forecasts and poll aggregations out there, it wouldn’t stop the forecasting; people would just be grabbing whatever nearby polls they could find and extrapolation from there. I prefer a more formal forecasting process, while recognizing its imperfections.
As for the post being too long . . . What can I say? Pay me a bit more and I’ll write something more concise for you!
In any case, I appreciate the feedback, and I understand that different readers are getting different things out of our posts.
I’m going to offer a guess.
After Biden dropped out, there are significantmh fewer voters who didn’t want to vote for Trump but weren’t sure they were going to vote for Biden.
It’s not so much that they all love them some Harris, but the “pox in both their houses” number has dropped. I just saw some polling numbers that the “hate both candidates” number has dropped significantly since Biden dropped out. I don’t have a ton o’ confidence in that polling veracity, but hearing that evidence was a great way for me to confirm my bias.
And I love me some bias confirmation.
Oy
Should be
After Biden dropped out, there are significantly fewer voters who didn’t want to vote for Trump but weren’t sure they were going to vote for
Bidenthe Democratic Party candidate.The pollsters and the main polling data web sites appear mostly biased democrat so they may be right half the time 😝 and take credit for it as per their “superior intellect” half the time 😆. No lie. No joke. Not hyperbole.