“Political Prediction and the Wisdom of Crowds”: Evaluating an election forecast over time by comparing to betting odds over time

Rajiv Sethi, Julie Seager, et al. write:

We evaluate the relative forecasting performance of three statistical models and a prediction market for several outcomes decided during the November 2024 elections in the United States: the winner of the presidency, the popular vote, fifteen competitive states in the Electoral College, eleven Senate races, and thirteen House races. We argue that conventional measures of predictive accuracy such as the average daily Brier score reward modeling flaws that result in predicable reversals, as long as such movements are in a direction that is aligned with the eventual outcome. Instead, we adopt a test based on the idea that the strength of a model can be measured by the profitability of a trader who believes its forecasts and bets on the market based on this belief. . . . We find that all models failed to beat the market in the headline contract but some did so convincingly in contracts referencing less visible races.

They continue:

The ability of prediction markets to absorb novel sources of information and respond rapidly to unfolding and unprecedented events is a strength relative to statistical models, which are built and calibrated based on an assumption that the past will remain a good guide to the future. But markets also have weaknesses relative to models, being prone to excess volatility and occasionally vulnerable to price manipulation. The question of whether markets or models are more accurate on average is therefore an empirical one, and cannot be answered based on logical reasoning alone. In this paper, we examine this empirical question using data from three statistical models—FiveThirtyEight [Elliott Morris], the Economist [Dan Rosenheck, Ben Goodrich, Geonhee Han, and me], and Silver Bulletin [Nate Silver]—and the Polymarket exchange, which was the only venue on which contracts for a broad range of electoral outcomes were listed for the entire period from early August until election day on November 5.

I’m pretty sure that if the Economist had run with Ben Goodrich’s ideas when putting together their presidential election forecast (see section A.2 of this paper), we would’ve performed better in Sethi et al.’s evaluation.

This is not to say that anyone but Ben deserves credit for that (hypothetically) better performance; we ultimately made the decision to go with the simpler model. My point here is only the familiar one that, those long juicy time series notwithstanding, ultimately this is only a sample of size 1, first because this is is all based on a single national election and second because the outcome of the evaluation can depend so much on a single choice we made during our modeling and implementation process.

The idea of evaluating a forecast by comparing it to market prices is interesting, and it sends my thoughts in two opposite directions:

1. Given that a market exists, it makes sense to evaluate any outside information (in this case, public forecasts) based on what they add in predictive power to the forecast. Richard Clarida explains this idea in chapter 9 of our book, A Quantitative Tour of the Social Sciences.

2. Conversely, market prices are presumably influenced by public forecasts and, beyond that, new polling information shifts the markets and forecasts together. A few days before the election we discussed an aberrant poll from Iowa, which shifted both betting markets and forecasts.

Putting these perspectives together, it could make sense not to just have markets and forecasts compete but to ask where will markets do better and where will forecasts do better.

In general I’d expect markets to do better with one-of-a-kind information and forecasts to do better with numerical data that is part of an ongoing process.

For example, it was not clear in a forecast how to model information from new voter registrations, data from neighbor polls, or perceptions vs. reality of inflation. But these are factors that markets can incorporate in some ways.

Incorporating that Iowa poll, though, is the sort of thing that a forecast can do very well. Bayesian inference and partial pooling (across states and regions, over times, and among poll organizations) does not come naturally to people, but a model-based forecast can just crunch and include that new information easily. It won’t be perfect, but accounting for new polls is in the wheelhouse of our election forecasting models. This suggests that if you’re betting, you might want to go with market odds but then use the shift in the public forecasts to get a sense of how much your predictions should change given this new piece of information.

9 thoughts on ““Political Prediction and the Wisdom of Crowds”: Evaluating an election forecast over time by comparing to betting odds over time

  1. “The ability of prediction markets to absorb novel sources of information and respond rapidly to unfolding and unprecedented events…”

    Like what?

    My suspicion is that we get these vague platitudes about markets because proponents need to sort of talk around what they think is really happening. How can a betting market be better than the best statistical model created by the smartest people?

    A betting market is a black box that everyone hopes incorporates insider knowledge. What else could it be? What is the secret sauce if not insider information?

    • Matt:

      “Unfolding and unprecedented events” is not a vague platitude! An obvious example that Sethi et al. give for the 2024 election is Biden dropping out. People were talking about this happening for a few weeks before it actually occurred. The markets were able to handle this, but it was hard for forecasts to do much with this one-time event. Another example is Trump getting shot at.

      • “The markets were able to handle this”

        The claim that “they handled this” does not add anything here. The vague part no one wants to talk about is HOW they handled it.

        Insider information aside, the belief here has to be that the collective intuition of some random assemblage of people is more accurate than any rational approach. And this collective “gut feeling” works fast, presciently and correctly adding 2 points to Trump’s projected vote total after he was shot at, or whatever.

        But no, that would be ridiculous. There is secret sauce! This is not a random assemblage of people. These bets are being made by people who are rich enough to bet on elections, and they could not have gotten that way unless they are good at it. I have seen this claim on this very blog.

        If collective intuition yields the most accurate prediction, why waste time on statistical models when we should be developing algorithms to find the right people to ask so they can just tell us who will win?

        I apologize for the snark, I don’t know how to discuss this topic without it sounding snarky.

        • Matt:

          You ask, “If collective intuition yields the most accurate prediction, why waste time on statistical models when we should be developing algorithms to find the right people to ask so they can just tell us who will win?”

          My response is that statistical models work best when there’s strong theory and lots of replication, not as well in one-off events. That’s what I said in my post above. If the question is how much to shift a forecast given a new poll, I’d go with the models. If the question is how much to account for one of the candidates possibly dropping out . . . well, the models didn’t include that at all! So, yeah, there I’d say that bettors’ behavior is better than nothing. Nowhere did I claim that “collective intuition yields the most accurate prediction.”

        • Andrew wrote:
          “So, yeah, there I’d say that bettors’ behavior is better than nothing. Nowhere did I claim that ‘collective intuition yields the most accurate prediction.'”

          My original point was that proponents of betting markets have a tendency to obliquely refer to the data generating process using vague terms, which I find frustrating because I want to know what people think is really going on.

          The simplest possible explanation for what could make betting markets “better” is that the data generation is guided by collective intuition, which is why I used that term.

          So then I get pushback on the use of the term “collective intuition,” with a preference for the much vaguer term “bettor’s behavior,” and we have come full circle.

          I sarcastically used the term “special sauce.” It fits together like this:

          [bettor’s behavior] – [collective intuition] = [special sauce]

          Why is the vague term “bettor’s behavior” a better description of the data generating process in betting markets than the actually meaningful “collective intuition?” What is the delta that made you disavow one term and offer the other?

          If bettors are using the same publicly available information as the modelers, then the only possible advantages the markets could have are either better use of that information, or timeliness. I find it hard to believe that a group of rando bettors would do better than the best minds and the best models, and I don’t think others are thinking that way either.

          If, on the other hand, betting markets somehow obtain information not available to the public, we have something else going on. Now the bettors are using – by definition – insider information. If this is the case, there is not much of scientific interest. The fact that cheaters prosper when they successfully cheat is not noteworthy.

          Put it all together, and the study of betting markets only looks interesting if we insist on treating the data-generating model as a black box, which seems to me to be what is going on here unless someone can explain otherwise.

    • Insider information could be one of the reasons, but can’t it also just be more banal frictions like:

      – Gathering data requires time and money and as a result statistical models take longer to update and sometimes don’t update for information the modelers believe is irrelevant
      – Surveys introduce their own uncertainty (e.g. respondents not answering truthfully/selection bias in respondents), even if you can account for this doing so takes time and fails to get rid of all uncertainty
      – Modelers/organizations that model have their own set of biases and so have blind spots to certain events/methods
      – Not everything can be captured by data/figuring out what and how to measure can be difficult

      All of this points to what Andrew mentions in the post that there must be certain types of situations where a market would do better than a forecaster

    • “Polymarket has market failures. In Polymarket “will Jesus Christ return in 2025?” has a 3% probability. …”

      As I understand it (going by my recollection of a newsletter by Matt Levine) this isn’t really a market failure but reflects details of how the market is run. In this case if you bet no you have to put up $.97 (or whatever now) in order to be paid $1 at year end. So in effect you are just getting paid interest on your money. But the market rules allow you to combine a yes bet and a no bet and be paid $1 now. So the yes bet is worth $.03 as it allows you to close out your no bet early.

Leave a Reply

Your email address will not be published. Required fields are marked *