On the ethics of pollsters or journalists or political scientists betting on prediction markets

There’s been a bit of concern lately about political consultants or pundits offering some mix of private and public forecasting and advance, and also making side bets on elections. I don’t know enough about these stories to comment on them in any useful way. Instead I’ll share my own perspectives regarding betting on elections.

In June 2020 I wrote a post about an opportunity in a presidential election prediction market—our model-based forecast was giving Biden an 84% chance of winning the election, whereas the market’s implied odds were 53% for Biden, 40% for Trump, 2% for Mike Pence, 2% for Hillary Clinton (!), and another few percent for some other possible longshot replacements for Biden or Trump.

Just to be clear: Those betting odds didn’t correspond to Biden getting 53% of the vote, they corresponded to him having a 53% chance of winning, which in turn basically corresponded to the national election being a tossup.

I thought seriously about laying down some $ on Biden and then covering it later when, as anticipated, Biden’s price moved up.

Some people asked why I wasn’t putting my money where my mouth was. Or, to put it another way, if I wasn’t willing to bet on my convictions, did I really believe in my own forecast? Here’s what I wrote:

I agree that betting is a model for probability, but it’s not the only model for probability. To put it another way: Yes, if I were planning to bet money on the election, I would bet using the odds that our model provided. And if I were planning to bet a lot of money on it, I guess I’d put more effort into the forecasting model and try to use more information in some way. But, even if I don’t plan to bet, I can still help to create the model as a public service, to allow other people to make sensible decisions. It’s like if I were a chef: I would want to make delicious food, but that doesn’t mean that I’m always hungry myself.

Ultimately I decided not to bet, for a combination of reasons:

– I didn’t quite know how to do it. And I wasn’t quite sure it would be legal.

– The available stakes were low enough that I couldn’t make real money off it, and, if I could’ve, I would’ve been concerned about the difficulty of collecting.

– The moral issue that, as a person involved in the Economist forecast, I had a conflict of interest. And, even if not a real conflict of interest, a perceived conflict of interest.

– The related moral issue that, to the extent that I legitimately am an expert here, I’m taking advantage of ignorant people, which doesn’t seem so cool.

– Asymmetry in reputational changes. I’m already respected by the people that matter, and the people who don’t respect me won’t be persuaded by my winning some election bets. But if I lose on a public bet, I look like a fool. See the last paragraph of section 3 of this article.

Also there’s my article in Slate on 19 lessons from the 2016 election:

I myself was tempted to dismiss Trump’s chances during primary season, but then I read that article I’d written in 2011 explaining why primary elections are so difficult to predict (multiple candidates, no party cues or major ideological distinctions between them, unequal resources, unique contests, and rapidly changing circumstances), and I decided to be careful with any predictions.

The funny thing is that, in Bayes-friendly corners of the internet, some people consider it borderline-immoral for pundits to not bet on what they write about. The idea is that pundits should take public betting positions with real dollars cos talk is cheap. At the same time, these are often the same sorts of people who deny that insider trading is a thing (“Differences of opinions are what make bets and horse races possible,” etc.) It’s a big world out there.

Real-world prediction markets vs. the theoretical possibility of betting

Setting aside the practical and ethical problems in real-world markets, the concept of betting can be useful in fixing ideas about probability. See for example this article by Nassim Taleb explaining why we should not expect to see large jumps up and down in forecast probabilities during the months leading up to the event being forecast.

This is a difficult problem to wrestle with, but wrestle we must. One challenge for election forecasting comes in the mapping between forecast vote share and betting odds. Small changes in forecast vote shares correspond to big changes in win probabilities. So if we want to follow Taleb’s advice and keep win probabilities very stable until shortly before the election (see figure 3 of his linked paper), then that implies we shouldn’t be moving the vote-share forecast around much either. Which is probably correct, but then what if the forecast you’ve started with is way off? I guess that means your initial uncertainty should be very large, but how large is reasonable? The discussion often comes up when forecasts are moving around too much (for example, when simple poll averages are used as forecasts, then predictable poll movements cause the forecasts to jump up and down in a way that violates the martingale property that is required of proper probability forecasts), but the key issue comes at the starting point.

So, what about future elections? For example, 2024. Or 2028. One issue that came up with 2020 was that everyone was pretty sure ahead of time, and also correct in retrospect, that the geographic pattern of the votes were aligned so that the Democrats would likely need about 52% of the vote to win the electoral college. So, imagine that you’re sitting in 2019 trying to make a forecast for the 2020 presidential election, and, having read Taleb etc., you want to give the Democrats something very close to a 50/50 chance of winning. That would correspond to saying they’re expected to get 52% of the popular vote. Or you could forecast 50% in the popular vote but then that would correspond to a much smaller chance of winning in the electoral college. For example, say your forecast popular vote share, a year out, is 0.50 with a standard error of 0.06 (so the Democratic candidate has a 95% chance of receiving between 38% and 62% of the two-party vote), then the probability of them winning at least 52% is pnorm(0.50, 0.52, 0.06) = 0.37. On the other hand, it’s been awhile since the Republican candidate has won the popular vote . . . You could give arguments either way on this, but the point is that it’s not so clear how to express high ignorance here. To get that stable probability of the candidate winning, you need a stable predicted vote share, and, again, the difficulty here isn’t so much with the stability as in what’s your starting point is.

Summary

Thinking about hypothetical betting odds can be a useful way to understand uncertainty. I’ve found it helpful when examining concerns with my own forecasts (for example here) as well as identifying problems with forecast-based probability statements coming from others (for example here and here).

Actual betting is another story. I’m not taking a strong moral stance against forecasters also making bets, but I have enough concerns that I’m not planning to do it myself. On the other hand, I’m glad that some low-key betting markets are out there; they provide some information even if not as much as sometimes claimed. Rajiv Sethi discusses this point in detail here and here.

37 thoughts on “On the ethics of pollsters or journalists or political scientists betting on prediction markets

  1. I bet large on Biden at average odds of 1.88 (52.3%). Still not sure if it was a bad bet (due to bankroll, not likelihood) or if polling became distorted during the campaign.

  2. You wrote: “But if I lose on a public bet, I look like a fool.”

    Why? Is it because so many people are innumerate?

    If you bet stake $1 to win $10^6 on a fair coin flip, you would lose half the time. But few would think you a fool. Of course, if you stake $10^6 to win $1 on a coin flip, even if you won, many would think you a fool.

    Prudential pays on insurance claims all the time. But they claim to have about a 10% ROE. They don’t seem to be seen as fools.

    • Bob:

      Let me put it this way: publicity is a two-way street. If I bet quietly and win or lose money, that won’t affect my reputation at all. If I bet loudly, publicizing my bet ahead of time, then I look good if I win, bad if I lose (for simplicity let’s just consider even-money bets here). I agree with you that this isn’t really right, unless we have data from a large sequence of bets. But if winning a bet makes me look good, then losing should make me look bad. People are still mocking Thomas Friedman for saying the Iraq War would be over in 6 months, and that’s fair, in the sense that he was the one making the public pronouncement.

      • It is not my position that your view “But if I lose on a public bet, I look like a fool.” is incorrect. Rather I was trying to draw out why you, someone who has spent many, many hours thinking about both probability and about communicating understanding of probability to others, have this concern.

        There is a distinction between a bad outcome (losing $1 on the $1 to $10^6 coin flip bet) and a bad decision (failing to accept that bet on the $1 side).

        Perhaps one concern comes from the topic—bets on election outcomes. People’s partisan views influence how they feel about election outcomes and how they feel about people who bet against their candidate. Maybe, if the bet were on the event “More than 5000 children born in the United States in 2023 will be given the name Robert”, one might expect that the impact of winning or losing the bet would have less of an impact on your reputation.

  3. Can you explain this?:

    > The moral issue that, as a person involved in the Economist forecast, I had a conflict of interest. And, even if not a real conflict of interest, a perceived conflict of interest.

    I’m trying to think of what would be a not real conflict of interest but (only) a perceived conflict of interest.

    • Joshua:

      Maybe you’re right here. I was thinking of, suppose someone had the perception that I was rigging the Economist forecast as a way to get better betting odds for myself. I was thinking that would be a perceived conflict of interest but not a real conflict, as I wasn’t actually manipulating the forecast in that way. But, thinking about it, I guess it’s still a conflict of interest. A conflict of interest is a conflict of interest, even if I don’t act on it.

      • Thanks. Yeah, that’s kinda what I was thinking.

        Of course, everyone here knows any conflict of interest for you would only be an un-manifested, perceived conflict of interest!

  4. I’m aware of that Taleb paper. Do people realise that Taleb made an error in it? Taleb basically miscalculated the volatility in generating Figure 3. There is no issue with how much the 538 prediction changes. Rather, it’s his “rigorous updating” line that uses an incorrect estimate of vote volatility.

      • I think the graph is just to illustrate that if you increase the volatility your model expects, the forecast becomes less variable, not more, so an unusually volatile public opinion does not explain an unusually volatile forecast.

        There are two points:

        1. Volatility in an election does not propagate to volatility in a forecast, but rather pushes the forecast to 50/50
        2. 538 is failing to account for 1

        Point 1 is most of the substance and is just true. Point 2 is questionable since as you point out he doesn’t discuss their actual methodology.

        You can look at 538’s method of accounting for volatility and it involves too much heuristic tuning to say definitively that it’s correct or incorrect. They definitely did better than most forecasts, but after the first two huge bumps, it probably would’ve been wise to revise the magnitude of their “event bump” to avoid the third, and if I were a betting man it wouldn’t have been absurd to try and arbitrage the “debate bump.”

        • To be clear, I do think that Taleb is being over the top harsh about a forecast that’s consistently one of the best available. While their 2016 election forecast does look fucky, the ones before and after don’t seem to have that systematic error. I do think they should have revised the bounciness upwards by the third debate hump. That kind of adjustment mid-cycle can give the appearance of impropriety. But I think it’s correct to be learning about the election from the election. Or, in the style of this blog, they could have the “bounciness” b_e of this particular election as being randomly drawn from a grant distribution of election “bouncinesses” B and have bayes pull b_e away from mean(B_mean) as we observe the polling cycle.

        • > 1. Volatility in an election does not propagate to volatility in a forecast, but rather pushes the forecast to 50/50

          Actually, in Taleb’s random walk model, *this is not the case*. Higher volatility means more movement is expected between now and the future, but it also means the *current* state is more likely to be extreme as well. The forecast is pushed to 50/50 only conditional on the current state, but the full course of forecasts is actually independent of the volatility.

          Thus, in his graphs, the 538 is the correct, well calibrated graph, while his “rigorous updating” line is a graph based on over-estimating volatility.

    • > There is no issue with how much the 538 prediction changes.

      Leaving aside the chart (I don’t remember whether it’s wrong but that wouldn’t surprise me at all) there is also an issue with how much the 538 prediction changes. When a forecasts moves all around the place that suggests that it was not always correct. (In a probabilistic, low p-value, unexpected behaviour sense.) However, it’s not clear what does it mean to say that it “violates arbitrage boundaries” or what’s the relevance. If the forecasts are correct (i.e. the model is correct) large swings are very unlikely but not impossible and the existence of arbitrage is not in correspondence with the correctness of the model.

      For those who missed it at the time that paper was discussed here: https://statmodeling.stat.columbia.edu/2020/10/12/more-on-martingale-property-of-probabilistic-forecasts-and-some-other-issues-with-our-election-model/

    • That, and, as we hashed out in 2020 with discussion of Aubrey Clayton’s paper, the whole application of arbitrage-bounds, martingale properties and so on to election forecasting is terribly muddy. It seems mostly to be a hyper-formalized rationalization of an intuitive heuristic that served Taleb well in option trading, applied to something he disliked based on an ‘eyeball test’.

      • I’m surprised at the disdain for eyeball tests here. I love eyeball tests. My preferred posterior predictive check is seeing if the inverse CDF of data points in the posterior predictive distributions looks kind of uniform enough. A lot of statistics to me is mostly just theoretical scaffolding for building graphs to eyeball test.

        It’s theoretically true that under a correctly specified model, the probability of forecasts jumping around becomes vanishingly unlikely. That provides the theoretical scaffolding for eyeballing the 538 2016 forecast and saying “that looks funny,” because it definitely looks funny.

        Delving into the actual forecast, I think it reveals a real, specific weakness of the model. A lot of the 538 model is hard coded constants in heuristic adjustments. For example, it deducts a constant 3-4 points from a candidate after their convention. Constants like this can only be calibrated ahead of time for the recent history elections; they cannot learn from the current election. The ability to transition from general knowledge to learn about the specifics in a principled way is a one advantage of a bayesian model. Elliot and Andrew’s economist’s forecast, for example, models polling biases as parameters learned from the current election over time, with the partisan nonresponse bias in particular comparing favorably to 538’s hard coded bounces.

        Suppose for the sake of argument that we saw in the next election the polls go from 75% to 25% to 75% in the first month. A human being would correctly conclude “the polls aren’t very informative this particular cycle for whatever reason, it’s hard to call”. But because the 538 forecast has its volatility hard coded, it WILL swing around wildly unless they break their policy of not adjusting the model mid-cycle. The 538 model categorically cannot learn from the intertemporal behavior of the polls within a cycle, so if the volatility constants they choose at the beginning are miscalibrated, you CAN arbitrage their forecast.

        All this to say, I think Taleb has a real point here, and his eyeball test on the 2016 election is not wrong.

        • I lost some text there somehow, it should read:

          it WILL continue to swing around wildly until the end of the election unless they break their policy of not adjusting the model mid-cycle.

        • Somebody,

          Yeah, Fivethirtyeight does a lot of complicated things which make some sense, and it’s clear that they’re not using a fully Bayesian approach. I understand: there are tradeoffs. A Bayesian approach is clean and, if you stick with it, it will have the martingale property, but: (a) the model can get complicated, and, realistically, if you stick with the Bayesian approach there’s lots of information that you might not put into your model just because it’s not clear how to do so, and (b) even if you go full Bayes, at some point you’ll find a problem with your model (Cantor’s corner) and then it will lose the martingale property. So, of course it will not be internally consistent. Which is fine. Internal consistency is a goal never to be fully achieved.

          What frustrates me about the Fivethirtyeight team is now resistant they seem to be to learning from outside criticism. The whole thing seems so weird to me. I love criticism; it’s a way for me to learn. And it seems bizarre for the Fivethirtyeight team to somehow thing that all the consequences of their many different forecasting choices will just happen to work out correctly. What a naive attitude toward statistical modeling!

        • Somebody, oh I’m a huge fan of visual checks and heuristics! It’s the bluster about ‘martingale’ and ‘arbitrage bounds’ that doesn’t really add anything useful. The issue is the wall of formalism used to justify those when the critique came down to visually checking tue forecast and judging it to be too bouncy. As Andrew says, 538 has a funny insular attitude to the whole thing, but that doesn’t mean you can eyeball a time series and determine ‘if it is a martingale or not’.

        • The polls moving more rapidly leads to less volatile results *if* and *only if* you look at this and conclude polls aren’t informative. But this is a choice. If you believe that they mean that the actual state of the election is changing rapidly, then it’s perfectly appropriate to change the prediction forecasts rapidly as well.

          The volatility of a well calibrated forecast is independent of the volatility of the process. It does not go *down*.

          See e.g. R code:

          set.seed(123)

          volatility = 0.05

          A = cumsum(volatility *rnorm(100))
          invlogit = function(t) exp(t)/(1+exp(t))
          polls = invlogit(A)
          probwin = NULL

          for (t in 1:98){
          remainingt = 100-t

          probwin = c(probwin, mean(A[t] – volatility*colSums(replicate(1000, rnorm(remainingt))) > 0)) #calculate probability of winning by simulation

          }

          plot(polls, type = “l”, ylim = c(0,1))
          lines(probwin, col=2)

        • The volatility of a well calibrated forecast is independent of the volatility of the process. It does not go *down*.

          What I mean is, “if you increase the volatility your model expects, the forecast becomes less variable”. If you also increase the volatility of the input data, they are independent.

          The polls moving more rapidly leads to less volatile results *if* and *only if* you look at this and conclude polls aren’t informative. If you believe that they mean that the actual state of the election is changing rapidly, then it’s perfectly appropriate to change the prediction forecasts rapidly as well.

          Yes, if your model of the world does not have the volatility of the current election as a partially free parameter, then the forecast will continue to update rapidly and the forecast is trivially correct according to its own model. But we can critique modeling choices, and models can fail to capture behavior that we *know* as humans is true. If you think polls are equally informative no matter how rapidly they’ve were moving in the past, that’s equivalent to the belief that “right now” the polls going to stop swinging abruptly. It means that the current election’s polling timeseries tells you *nothing* about this election or how it might differ from your expectations at the beginning. I feel comfortable enough calling that a failure mode.

        • I played a bit with that code. Large swings in the forecast probability are unlikely – as expected for a proper, well-calibrated forecast.

          For example, in less than 40% of the runs the sequence of forecasts goes from 60%/40% to 40%/60% (or viceversa) – and in only 10% will that happen twice. (Independently of the value used of the “volatility” parameter.)

          More drastic swings are much less probable. Reversing a 80/20 forecast has probability lower than 20%. A full cycle 80/20 -> 20/80 -> 80/20 happens in less than 2% of the runs.

          [I’m not saying that the 538 forecasts were this bad – or bad at all for that matter. I was careless above and said that “there is also an issue” when I should have said that “there could be”.]

        • Well, it’s subjective, but it seems to me that according to the random walk model, a well calibrated forecast probability looks a lot more like the 538 estimate than Taleb’s supposition. 538’s forecasts certainly never flipped to a “Trump has 80% chance of winning” forecast, yet the model can generate that fairly often.

          > If you think polls are equally informative no matter how rapidly they’ve were moving in the past, that’s equivalent to the belief that “right now” the polls going to stop swinging abruptly.

          No, it’s equivalent to the belief that the degree to which they have moved so far in the current election cycle will be the same as how much they move in the rest of the election cycle. If you think right now the polls are going to stop swinging, then current polls are actually *more* informative the higher the degree of past volatility there is.

        • No, it’s equivalent to the belief that the degree to which they have moved so far in the current election cycle will be the same as how much they move in the rest of the election cycle. If you think right now the polls are going to stop swinging, then current polls are actually *more* informative the higher the degree of past volatility there is.

          I think there’s some miscommunication here. If you’re predicting the endpoint of a random walk, and volatility is set to a constant that is less than the actual volatility of the walk, predictive draws from your model will have the swinging decrease abruptly at the “current” forecasting time, and the path of your predictions will swing around wildly the whole time. If volatility is given a prior and learns from the walk, the forecast will start off swinging but converge gradually until it matches the ideal “volatility-independent” swinginess you demonstrated.

          Well, it’s subjective, but it seems to me that according to the random walk model, a well calibrated forecast probability looks a lot more like the 538 estimate than Taleb’s supposition. 538’s forecasts certainly never flipped to a “Trump has 80% chance of winning” forecast, yet the model can generate that fairly often.

          It’s not just the magnitude of swings, but the timing. The bumps correspond with the conventions and debates, precisely where 538 has to apply their hard coded manual adjustments. So I do think here

          https://imgur.com/lzj7iVN

          I do think it’s readily apparent that it’s gonna go back down

        • > 538’s forecasts certainly never flipped to a “Trump has 80% chance of winning” forecast, yet the model can generate that fairly often.

          The discussion depends a lot on terms like “often” and “swinging” that are not well defined.

          In any case, the martingale property imposes some bounds which are independent of the distribution of future forecasts.

          If the forecast is now at 75%/25%:

          – at every point in the future the probability that we forecast even odds (50%/50%) cannot be more than 1/2.

          – at every point in the future the probability that we forecast a complete flip (25%/75%) cannot be more than 1/3.

          (N.B. I’m making a general point, it’s not a commentary about the 538 forecasts in particular.)

  5. Zhou and Somebody:

    I’ve not read Taleb’s paper carefully so I offer no opinion on the details of his graphs. I think his general point makes sense, which is that because of the martingale property, a calibrated forecasting process should have a low probability of big swings of forecast probabilities.

    Realistically, calibration of a forecast process is a goal that is just about never achieved, so it does not surprise me that the Fivethirtyeight.com forecast series do not appear to be consistent with the martingale property. Our Economist forecasts didn’t have that property either, so I wouldn’t expect it of Fivethirtyeight.com, given that their procedure is not based on a generative probability model.

    I agree with Somebody that the Fivethirtyeight.com forecasts are excellent compared to a lot of what’s out there. It’s hard to make a forecast process.

    I’m still hung up on the question of how to best express uncertainty at the beginning of the forecast, as discussed in the second and third paragraphs of the “Real-world prediction markets vs. the theoretical possibility of betting” section of the above post. That didn’t get much discussion; maybe it needs its own post.

    • The problem is, what is this low probability?

      Ultimately, if you believe that poll swings represent something significant about the state of an election campaign, the time-evolution of the forecast probabilities seen by 538 and others actually do have the appropriate amount of volatility. One can easily see this by producing a simulation model. A well calibrated model looks like a random walk, and those have swings *and* martingale properties.

      The IMO reasonable way to get a smooth line is to introduce a large overall polling error term that is independent of time.

      • Zhou:

        You can also get a smooth line by allowing for the probability of large swings in public opinion during the campaign. This keeps the probabilities from getting too close to 0 or 1 until shortly before the election.

  6. This comment from the 2012 link

    “My first reaction was: this looks pretty but it’s hyper-precise. I’m a big fan of Nate’s work, but all those little wiggles on the graph can’t really mean anything. And what could it possibly mean to compute this probability to that level of precision?”

    is interesting when one looks at how the Economists forecasts were presented. Those charts also had wiggly lines including in the period between the forecast and the election and it was not even clear what the latter represented – at least to me: https://statmodeling.stat.columbia.edu/2020/06/19/forecast-betting-odds/#comment-1363702 and 2020/07/31/thinking-about-election-forecast-uncertainty/#comment-1397684

Leave a Reply

Your email address will not be published. Required fields are marked *