Don’t kid yourself. The polls messed up—and that would be the case even if we’d forecasted Biden losing Florida and only barely winning the electoral college

To continue our post-voting, pre-vote-counting assessment (see also here and here), I want to separate two issues which can get conflated:

1. Problems with poll-based election forecasts.

2. Problems with the pre-election polls.

The forecasts were off. We were forecasting Biden to get 54.4% of the two-party vote and it seems that he only got 52% or so. We forecasted Biden at 356 electoral votes and it seems that he’ll only end up with 280 or so. We had uncertainty intervals, and it looks like the outcome will fall within those intervals, but, still, we can’t be so happy about having issued that 96% win probability. Our model messed up.

But, here’s the thing. Suppose we’d included wider uncertainty intervals so the outcome was, say, within the 50% predictive interval. Fine. If we’d given Biden a 75% chance of winning and then he wins by a narrow margin, the forecast would look just fine and I’d be happier with our model. But the polls would still have messed up, it’s just that we would’ve better included the possibility of messing up in our model.

To put it another way: a statement such as “The polls messed up,” is not just a statement about the polls, it’s a statement about how the polls are interpreted. More realistic modeling of the polls can correct for biases and add uncertainty for the biases we can’t correct. But when the vast majority of polls in Florida showed Biden in the lead, and then Biden lost there by a few percentage points, that’s a polling error. As the saying goes, The prior can often only be understood in the context of the likelihood.

P.S. To address a couple issues that came up in comments:

– We can’t really say how much the polls messed up in any state until all the votes are counted.

– The polls messed up more in some states than others. Florida is the clearest example of where the polls got it wrong.

– If you include a large enough term for unmodeled nonsampling error, you can say that none of the polls messed up. But my point is that, once you need to assign some large nonsampling error term, you’re already admitting there’s a problem.

– Saying that the polls messed up does not excuse in any way the fact that our model messed up. A key job of the model is to account for potential problems in the polls!

Ultimately, there’s no precise line separating problems in polls with problems in poll-based forecasts. For a simplified example, suppose that all problems would be fixed by adding 2 percentage points to the Republican share for each poll. If the pollsters did this before releasing their numbers, we’d say the polls are fine, no problem at all. A bias corrected is no bias at all. But if the bias is just sitting there and it needs to be corrected later, then that’s a problem, whose scope is reduced by not eliminated by adding an error term allowing the polls to be off by a couple points in either direction.

To put it another way, we already knew that polls “mess up” in the sense of having systematic errors that vary from election to election. Much of poll-based election forecasting has to do with making adjustments and adding uncertainties so that this doesn’t mess up the forecast.

197 thoughts on “Don’t kid yourself. The polls messed up—and that would be the case even if we’d forecasted Biden losing Florida and only barely winning the electoral college

  1. I’m not so sure that “messed up” is the best statistical interpretation, although from a public education perspective it is certainly merited. As more information comes in (and note that large absentee counts are still to come from states like NY which aren’t getting much attention) it looks like the results may be:

    a) within margin of error for the national popular vote
    b) within margin of error for most states
    c) off by more than margin of error for Florida

    With ~25 states getting polling attention, that one ‘miss’ is within the expected range.

    In other works, this is more of a likelihood ~ 0.10 event than a ~0.01 event. Enough to say that we need to work on our model and do some updating of our priors, not enough to conclude that we need to ‘reject’ the existing hypotheses (i.e. all the work that goes into polling)

    • I think the Upper Midwest states were also significantly off in vote margin, even though the winners (where announced) are not unexpected. Ohio and Iowa should have been closer than they are, and Wisconsin less close.

  2. This outcome demonstrates the power of betting markets, which had Biden at around 65%. Yes, there is stupid money. But there is usually more smart money willing to take the other side of any bet for a profit. Some markets have tiny limits. But there are larger markets, and prices are usually consistent across markets.

    Bettors read statistical blogs and watch the news. Markets aggregate their information. Respect the markets!

      • My current position is somewhat in line with KL here.

        Yes, there were some crazy bets, like way too much chance to win California, but you had to offer 30:1 odds to get action on that bet, and the betting sites take a share of your winning, so there’s a lot of friction in correcting that pricing error. If you think Trump has a 50% chance and you can get someone to offer you 3:2 odds, why would you gamble $1000 to win $25 (or whatever) on the California bet?

        I had thought, or perhaps hoped, that the markets were skewed by Trump fans who just couldn’t bear to see their guy at bad odds, or who were wildly overoptimistic, but that seems like less of an explanation to me now.

        The polls were wrong last time too, and in the same direction and by roughly the same magnitude. We will have to let polling firms tell us why they weren’t able to fix whatever went wrong last time, but perhaps whatever made them wrong last time is not easy to fix, and “the markets” knew that.

        I just heard a podcast in which a pollster who had numbers much more favorable to Trump was explaining (days before the election) why he thought the polls where wrong, and he said, among other things, something like “Nate Silver thinks people don’t like to pollsters, but I know they do.” He was speculating on a “shy Trump voter” effect, although “shy” might be the wrong word. Others have said perhaps a lot of Trump voters deliberately tried to deceive pollsters; that’s possible too. Either way, if there are a lot of Trump voters who are responding to polls but stating the opposite of their intent, this would happen in all the polls and explain why most of the polls didn’t have enough Trump support.

        It’s also possible that it was the “likely voter” models that were all wrong, I suppose: maybe the polls were right about general sentiment of eligible voters but underestimated the number of Trump-preferring eligible voters who would actually vote. This seems like a less likely explanation than the above, but really I don’t know.

        I also suspect that most polls are conducted by firms that are less pro-Trump than the average voter. I think almost all polling firms are really trying to get it right rather than to have things come out in favor of the ideologically preferred candidate of the pollster — their businesses will presumably suffer if they have a consistent bias — but I also think that if you think Trump “should” do substantially worse than last election, and your polls are coming out that way, you will be less likely to question them than if you expected Trump to be doing well.

        Either way, although it’s possible that the markets were lucky rather than good, my current thinking is that in the future I will trust the markets more and the polls less.

        Of course, there’s nothing stopping you (or me, or anyone) from including market information in the forecast. That’s probably the best thing to do.

        • Phil:

          Strong Republican performance in the House and Senate races suggests to me that much of the surprising election outcome has to do with voter turnout.

        • I’m guessing that the higher turnout messed up the “likely voter” calculations.

          Also, the level of divisiveness, with large majorities of both Trump and Biden supporters expressing very strong feelings about the possibility of their candidate losing, makes me wonder if the publicized strong early Democratic turnout for early/mail-in voting led to a higher-than-expected turnout for Trump on Election Day.

        • Perhaps this is evidence that the “high turnout is better for Democrats” assumption is wrong or at least situational. (It might hold true if the R candidate was a classic Bush or Romney type, rather than effectively a populist.)

        • I’m really interested in this. In a MRP-style model you have to post-stratify, so modelling turnout by each group seems quite crucial. Am I understanding this right? If so, would it be interesting to condition on the actual turnout by group and see how much that helps explain the surprising result?

        • Phil –

          > I just heard a podcast in which a pollster who had numbers much more favorable to Trump was explaining (days before the election) why he thought the polls where wrong, and he said, among other things, something like “Nate Silver thinks people don’t like to pollsters, but I know they do.” He was speculating on a “shy Trump voter” effect, although “shy” might be the wrong word. Others have said perhaps a lot of Trump voters deliberately tried to deceive pollsters; that’s possible too. Either way, if there are a lot of Trump voters who are responding to polls but stating the opposite of their intent, this would happen in all the polls and explain why most of the polls didn’t have enough Trump support.

          I don’t think there’s much evidence to support the theory that the inaccuracy in the polls is because of people stating the opposite of their intent.

          First, there isn’t a difference between live caller polls and Internet polls thst you might expect with such an intent.

          2nd, the theory would be that pubz lie and say that they’re demz would are voting for Biden – except that then the pollsters would adjust their sampling for party id, which would essentially eliminate those liars from the sampling because the sample would seem to overrepresent demz.

          And as for the “shy” effect, we don’t seem to see Trump outperforming his polling (at least in 2016, not sure about 2020) more in blue areas as opposed to red areas, as we’d presume ly see if people were intimidated to admit they’re Trump supporters. Also, the basis of the “shy Trump voter” was I irialky based on the social desirability influence ala the “Bradley effect” whereby people were worried abiur being perceived as racist; UT there was no Bradley Effect with Obama as he didn’t underperformed his polling but overperformed it – the opposite of what you’d expect with a Bradley Effect.

          Assuming a shy Trump voter effect is politically convenient for conz as it fits with a politically expedient narrative. There could well be other explanations for the polling error and the shy Trump voter effect is a concenient confounding theory. Maybe it’s true, but it seems the 3cidenxe is lacking.

        • Plus what Andrew said – so it wouldn’t be a shy Trump voter effect but a shy republican voter effect, which doesn’t make a whole lot of sense to me.

          But perhaps, yeah pubz were more animated to vote than the polling weighting accounted for. Or perhaps for some reason pubz are more likely to not respond to polling. Seems to me that some serous non-response sensitivity analysis is needed. Granted it would be complicated to do.

        • Andrew and Joshua,
          I certainly concede to not knowing what the cause is, or even having a good idea what it is.

          Andrew, I don’t think there “shy Trump voter” theory rests on the idea that voters are lying about _both_ party identification _and_ presidential vote; someone could correctly give party ID (or claim to be independent) and still say that they’re voting for Biden when they aren’t. Also, they wouldn’t have to lie about their intended vote, they could just decline to answer — could decline to answer the question, or could decline to participate in the poll — which would have (I assume) less effect but still some effect.

          But sure, an incorrect ‘likely voter’ model could also be the culprit instead. Or both an incorrect voter model and a ‘shy voter’ effect; it’s not like they’re mutually exclusive.

          I’m guessing the polling companies will be able to figure this out somehow. I hope they tell us!

        • Phil –

          > someone could correctly give party ID (or claim to be independent) and still say that they’re voting for Biden when they aren’t

          The thinking on that is that it isn’t likely a phenomenon big enough to have a significant effect because such a high % of the people who say they’re pubz also say they support Trump. There aren’t enough left over to lead to a high level of inaccuracy in the polling that we see.

          But my thinking is that the “shy Trump voter” effect means people intimidated to say they’re going to vote for Trump. That’s what I’m saying there is little evidence in support. And you see it confidently asserted by many people despite that lack of evidence. It’s a classic case of correlation = causation thinking and partisan pubz hearing some people claiming they lied and extrapolating from that without controlling for the asoecid that being a Vince ient narrative (that meanie demz are intimidating poor pubz who are victims. “Oh, the humanity!”)

          But sure, there could be combo of effects. My personal bias is that it’s a differential in non-response, but I don’t really have evidence for that either. I’more just responding to the annoying lack of skeptical control for confounding variables in a self-serving narrative that’s so ubiquitous among conz.

        • Most polls are weighting to stated 2016 vote. So this is a pitch that involves voters being willing to admit they voted for Trump in 2016 and are Republicans but shy to admit they’d vote for him in 2020? Seems dubious.

        • I think the pollsters must have had a very difficult time with their likely voter models this year. With many new mail-in options, fear of the virus at the polls (discrepant by party and age), and even mail delays causing many mailed ballots to not be delivered in time, there are many novel differences in the mechanics of voting, and hence, who actually casts a ballot that gets counted.

          If pollsters used a likely voter model from past years, a voter with the same level of intent to vote may have been more or less likely, depending on circumstances, to actually vote. If they change the model to account for this, they have almost nothing to calibrate it against.

      • No way. you gotta understand longshot odds as a *natural* property of time discounting (so Trump at 3 in Cali is likely more a property of longshot odds). The markets got Florida, NC, OH and a bunch of other states very right. Im aware of your previous stance that these are “less informed” – but I think yesterday showed the markets to be of serious value.

        • Dan:

          No, it’s not time discounting. You could get that bet on California two days before the election. Full credit to the markets for Florida etc. but that doesn’t mean the markets are flawless. If we’re interested in the markets and respect the markets, it makes sense for us to see where they fail.

          But, yeah, I take your larger point that the data suggest we should take the markets seriously. Just cos they’re not perfect doesn’t mean we can afford to ignore them.

      • Andrew: Yes, this is a well-known effect. Let’s examine the depth of the markets.

        ElectionBettingOdds brags “Over $1 million bet” across three different sites. The overwhelming amount would be staked on the presidential election winner, with some on the margin of victory or vote share. There might be less than $100K bet on swing states like Pennsylvania or Florida, and less than $30K on California. You would need to lay over $3K to win $100. So, over the past few months, there were perhaps 10 recreational players who bet $100 apiece. Many people spend more on lottery tickets.

        In contrast, there were larger unregulated offshore outlets that taking $50K+ bets on the Presidential Election, before adjusting their odds slightly and taking another $50K+. Bettors at other exchanges can try to arbitrage or at least imitate their odds. Consequently, global markets offer very similar odds that reflect a deep, liquid, and informative market.

      • Predictit at least has a 5% withdrawal fee, so once costs are accounted for along with bid/ask spread, 3% on predictit at least is 0% for Trump. So what is going on here isn’t so much stupid money as a reflection of a market structure that makes simple interpretation of probabilities fail at the level of around 5%. Insomuch as it does reflect stupid money, it is people who think Trump has a 0% chance and don’t understand their trading costs. I expect similar structural reasons on other sites.

        • Scientist:

          I don’t know about Predictit, but according to Josh Miller, who knows about these things, it really was possible to make money on that California bet on Betfair, as they take the vig out of your winnings, not out of the total bet. So, yeah, it’s stupid money. Or money that people were throwing around for entertainment money that was sitting on the floor and nobody bothered to pick it up, or something like that.

    • The betting market went from Trump 35% to Trump 75% as the early results came in, despite the fact that the only surprising results were from Florida. They are now (2:00 on Wednesday) back to Trump 25%.

      Betting markets in theory are a good indicator, for all the reasons that KL and Phil say. Existing politics betting markets, with thin total pools and large transaction costs, are not there in practice.

      • It makes sense to me that the markets would respond to information as it came in. Florida returns suggested there was a substantial polling error. When I saw the Florida results I, too, shifted my assessment strongly away from Biden. The fact that Florida came in fairly strong for Trump certainly suggested the race was going to be a lot closer than the central estimate from the national polls.

        • About 15 minutes after making that garbled post I realized that amazingly I had forgotten to have coffee this morning! As a confirmed addict, that says something about my post-election state of mind.

        • Phil:

          Betting markets were offering a 16% net return if Biden won the popular vote (ie 86% chance). The market was deep at this price, and Betfair keeps only 5% of your winnings. That was free money, a lot of it. That was more extreme than California. I suspect the problem is that there are few US bettors in those markets (not allowed) and people believed too much in the possibility of widespread irregularities.

        • If Predictit gave 16% net return that wouldnt be too surprising because you are limited to $800. Betfair is unlimited and they were still pretty far off. Also, IM0 the movement to Trump during the night was way too much. I suspected people were reacting to the big polling error they saw in Florida and extrapolating it to the upper Midwest, but Ohio polls and outcomes were pretty consistent at that point. 50:50 made sense, not 70:30.

      • I don’t agree that the betting markets are all that good, except that just before the election they might be closer to the outcome. Maybe that’s good for speculators, and maybe there’s a way to do arbitrage somehow. But look back a month or two, and compare their time evolution to the economist or 538-style forecasts.

        Most of us are interested in forecasts because they ought to give us some predictability, or let someone (maybe a campaign) do some corrective action, not because they look accurate when it’s too late.

    • People are overinterpreting the instantaneous market signal.

      I really don’t care at all about betting markets but the stock market is one big betting market that I’m very familiar with. It’s frequently wrong. And it’s frequently right. It can be right today and wrong tomorrow, and wrong for all the right reasons and right for all the wrong reasons.

      Markets converge on the outcome through the incremental accumulation of information. But they converge on it by in some cases wildly overreacting to news and in others underreacting to news and in still others reacting appropriately. The real signal is the trend between the two extremes and it’s not possible to know if that’s “right” or “wrong” at any given moment.

      So the polls were off in Florida. Last night it looked good for Trump in some respects but since vote count takes longer in the cities where there are more voters and more left leaning voters, the trend over time is to the left, and even though Trump won Florida it still didn’t look that good for him because the other races were close.

  3. Lay people (non-statisticians) don’t like uncertainties. We don’t want wider CI. We want better polls and narrower CI. If better polls are too explensive, perhaps survey companies should group themselves into consortia and do fewer and higher quality polls in each state.

    • Do explain why a pair of $1M polls is better than 20 $100K ones.

      Include in your explanation the purpose of these polls. Hint: the purpose is not primarily to predict the outcome of an election.

      Answer: emoctou eht ecneulfni taht snoitca elbane ot si tI.

      • @Dzgaughn,

        I am not a statistician. I truly don’t know if my Idea of fewer and higher quality polls are better. I just imagined that it could be and tried to colaborate with the dabate. But I leave the decision for polls companies and statisticians.

        If you are truly interested, the rationale of behind my idea was: polls have a fixed cost and a variable cost. The more coverage and they have over counties in each state territory, which implies in a large sanple the best point estimate and the narrower the confidence interval.

        I do understand that identifying the trend is more important than have a single point estimate. I am not arguing in favor of juts two polls. I could go for the opposite extreme, do you mean that it would be better to interview 10 randon people every day of the year instead of doing 10 very good polls focused in the three months before the election? I know that this is not what you mean. A balance must be found.

        The fact is that the polls not only got the final estimates wrong. They also got the wrong trend. I thougt that might me due to a coverage problem. But of course, I hope pollsters can figure this out.

      • I truly don’t know how the sample sizes are calculated in each state. But I assume that they are not completely at random and some stratification does occur.

        Looking at the vote map in both last elections we can see there is a strong stratification within states, between counties. Not sure if poll samples account for this variation.

  4. I think it would be useful for someone to compute a p-value for the current polling errors. Like given that the assumptions of the polls were true, what is the probability of seeing an overall error this extreme or more? Has anyone computed such a statistic yet for 2020?

  5. It is very convenient to say “the polls messed up” when it is your model that doesn’t appropriately model the noise in the polls. It’s like saying “the patient died — hey, looks like the microscope messed up”.

    • Harsha:

      Sorry for not being clear on this. The polls messed up. Our model also messed up! Our model included a term for polling error but that was not enough.

      My point in the above post is not to absolve our model. My point is that, even if our model was fine, that would not mean the polls did not mess up, it would just mean that our model successfully corrected for problems with the polls.

      I added a bit in the above post to clarify.

  6. Andrew,
    How does one explain the discrepancy of Trump’s showing with Hispanics in FL against a candidate, Biden, who is more popular than Clinton?

    Why didn’t polling pick up Trump’s dramatically increasing popularity among Hispanics in FL?

    I would suggest that there are conflicts of interest and lack of cultural understanding as two causes.

    First, most pollers, like most in the elite media and academia are Democrats and it may not be so different than a drug company writing its own medical articles, peer reviewed by high impact journals, but biased all the same.

    The fix then would be to have GOP pollers who are pro-Trump (and not Never Trumpers) also do polling and compare results.

    Second, in a states with a large Hispanic presence such ss FL, it might help to have Hispanics themselves run some polls to see what the results are, they may be some cultural interaction at play that non-hispanics may inadvertently not be aware of that Hispanics might be aware of.

    I take from my background of Judaism where a lot of people who are not Jews who think they understand Jews may not really understand so well.

      • Good point. It would be interesting to conduct polls from different groups to see if outcomes are that much different.

        eg. F, D; F, R; M, D; M, R; and then for cultures such as Hispanic and see if we differing results. Each can use experts in polling but using their own techniques otherwise, that might be more “culture” specific.

    • Both Univision and Telemundo sponsored polls. One did pretty well, the other not so much.

      Hispanic populations are not a monolith, and it is a convenient fiction for social scientists to act otherwise. Cubans and Venezuelans clearly behaved very differently at the polls than other groups. You can also see this effect in SES, educational attainment, second generation outcomes, etc. etc.

      Also there are plenty of conservative pollsters and my impression is that most of them were… also wrong.

    • Yeah, this.

      As part of the first factor, I think the high degree of partisanship/polarization makes people miss the fact that people’s values/attitudes/positions don’t fall neatly into two boxes.

      Also, “Hispanic” is a huge category that includes a lot of cultural and political variation. Cuban-Americans =/= Puerto Ricans =/= Mexican-Americans. And first-generation immigrants are not the same politically/culturally as those who have been in the US for many generations, I believe.

    • I had seen a story about Trump’s large gains with Hispanic men before the election in the NYT, so it wasn’t a total surprise, although perhaps the magnitude was a bit larger than expected.

  7. The two big problems in current polling are non-response bias and the use of modeling to assess the likelihood of voting. The problem with polls correcting their nonresponse bias and voting likelihood based on the last election is that both of these variables are liable to shift in the next one… and they may continue shifting along a trend (suggesting you need to make the models more Trump-friendly) or regress (suggesting that you need to make them more Biden-friendly) with precious little to guide you. In this case, Trump appears to have found voters who I suspect the polls just didn’t think were very likely to vote, even adjusting for the fact that he found a substantial number of such people last time, which explains the strong performance in the Senate and House races as well.

  8. Also — doesn’t the fact that Biden, even after several results which were strongly Trump-favorable, still has an 80 percent chance or so to win suggest that a 95 percent chance to win beforehand was not unrealistic?

    • There are two separate questions: how did the polls do, and how well did the model incorporate information from the polls? This post is really about the first question, and the answer is: the polls did lousy.

      • I hear you Phil, and have assimilated the other things Andrew has said about this. But what I still don’t understand is that the polls come with margins of error. The model takes those margins of error seriously (as well as other things) and derives its own margin of error. If, when you’re done, you’re within the incorporated-all-information margin of error, you’re just saying that polling is noisy…. Not wrong. Just noisy. Is there a Florida poll that gave no credible weight to Biden at 48%? No. Now that statement on its own is a bit misleading, since there were multiple polls giving similar, but wrong, results, which means that something other than pure sampling error must be involved. But as I said above… we know what those things are: non-response error and likely voter model error. So there is nothing uniquely bad about these polls… it’s just that there are limits to what polling can possibly do. The Economist and 538 models amalgamate those errors and yielded results. But it’s not clear to me why anyone expects the polling models to be anywhere near their sampling errors alone.

        • I think everyone in the business acknowledges that sampling error is the least of it….which does make it odd that that’s the only error that is regularly reported. The bigger issue is that many of the polls were wrong in the same direction and by about the same amount. And, perhaps more to the point, the error was larger than is usually the case. Both the Economist model and 538 include the fact that there can be an overall poll error, but if that error turns out to be outside the normal range, well, they aren’t expecting that.

          Perhaps it’s no surprise that the errors tend to be in the same direction: if one firm decides that such-and-such is the best method for adjusting for differential non-response, why wouldn’t the others? Maybe there’s also some herd mentality: if your estimate is really different from the others — especially from other polls that have done well in the past — perhaps you start looking for evidence that you should adjust in some way that brings you closer to them, and if you search for it you can probably find it.

        • Phil –

          Are you separating out sampling error from the issue of errors going in one direction from differential non-response?

          If so, how? Seems to me they all interact?

        • Oh, sorry Joshua, that’s just poor phrasing by me.

          Pollsters realize that whatever method they use to reach people, it’s going to be unrepresentative of the actual voters: it’ll have too many or too few young people, or too many or too few well-educated people, or too many or too few Hispanics, and so on. So they ask information about age, education, ethnicity, etc., so they can post-stratify to get a sample that matches the electorate. I _think_, though I’m not sure, that they assume that within a stratum they have an unbiased sample: Yes, their sample of college-educated middle-aged Hispanic women might happen to have too many or too few Trump voters, but they assume that’s only due to small-sample effects (what I meant when I said ‘sampling error’), not due to bias. I think that when they report the uncertainty in the poll number, this is what they mean. Whereas in fact, the bias could be the larger source of error.

          But you’re right, of course, that a biased sample is a “sampling error” too.

  9. How much do exit polls from prior elections factor in these models? Are they driving likely voter models? I’m suspicious of exit polls in general, especially this year, but they seem to be the only way to know about demographics of who actually voted. Then you have demo adjustments based on those, and so on. Is this one of the sources of error?

  10. will it be possible to pool all polls together, and also have a really fine grid of survey details? For example, we may do some multilevel analysis based on poll data down to county level, or even zip codes, and aggregate results from that level up to the state level, adjusting for population demographics and percent of party affiliation (sort of post stratification way).

    It is well known that even in a red state, not all counties will go to republican party. In southern states, large cities tend to go democratic party, and all the rest of small cities and rural areas will go republican. However, the population size in small cities and rural areas overall is much more than that of large cities, so the whole state goes red. Winner takes all.

    I am not sure how the polls were done. If they never reach rural or small town population, the poll results are obviously biased. I guess all survey statisticians know this basic problem.

  11. How does the polling work? Whenever I talk to a survey provider like Dynata, they start sweating when I ask to have the same number of conservatives as liberals in the sample — conservatives seem much harder to get. It would seem that has to do with the willingness to provide a public good (the poll results) which tends to be much higher for liberals. So even with probability-based polling you’d seem to have a bias towards liberals. How do the polls correct for that?

  12. Half-seriously, isn’t claiming that the polls messed up because the result ended up inside but at the border of the 95% forecast intervals a bit like stating that a certain result “tends toward significance” with p=0.07?

    • Christian:

      Yeah, I’m not sure, I’m still struggling with this one. I felt that when first posting after the election, it makes sense for me to start by acknowledging our problems. I’m sensitive to the problem of researchers not admitting their mistakes, so I didn’t want to follow up on the election shocker with a post saying, “Hey, the result fell well inside our 95% interval, so what are you all complaining about?”

      • Fair enough and I like that attitude. Still if being inside a 95% interval is a reason to think the model is bad, aren’t you effectively saying that things that have a 10% probability really, really shouldn’t happen, which somehow amounts to claim that a 90% interval should really have a 100% probability? (And then if all goes well from now – I’m partial here -, at the same time you will still feel embarrassed that you gave an event that actually happened a 96% probability…)

        Of course looking at all the forecast intervals for states (or other things) combined somehow you may have a case for your model being bad, but then you may not…

      • Today is definitely not the day for the fall inside the interval post. I don’t think a goal of forecasting should be that the actual result falls right in the middle of the interval. I think we should adopt the business forecaster’s trick: call your model outputs the “expectation”- and the actual result either over or under performs the expectation! Here, we can say Trump over-performed but just not enough.

        • Surely I’m not asking for a more positive post. However, isn’t the business of proper statistics to indicate uncertainty, and one well accepted way of doing that is to give 95% intervals which may be wide? And then if we are inside such an interval but borderline, we can at least say, we knew that our prediction had a degree of uncertainty and what happened is consistent with that. Winning the game of having the most precise forecast is nice and exciting, but it’s not the statistician’s mission in the first place.

        • Mostly agree, however the 95% is arbitrary and likely unsuited to the purpose of electoral forecasting versus signaling an intervention worth further study. Perhaps 50% intervals, which Andrew indicated missed the actual result.

        • > Perhaps 50% intervals, which Andrew indicated missed the actual result.

          How is that more suited to the purpose of electoral forecasting?

        • I needed to stress the connection between perhaps and purpose more clearly.

          If one needed to decide to accept employment or not in the US by Nov2, they might want a 99% interval, whereas for deciding whether the modelling needed improvement on Nov 4,5,6… _perhaps_ 50%.

        • Keith: What is your argument for 50% being better here? Of course the precise value of 95% is arbitrary but it makes sense to choose a very large probability somewhat away from 100% if we want to express “this is what is realistically conceivable” or “this is what wouldn’t be a really big surprise”. It makes a lot of sense to me to choose such a probability somewhere between 0.9 and 0.99 and voila, that’s where the conventional levels are, with 95% pretty much in the middle. These things are often called “arbitrary” but there is some sense in them. Of course accepting that whether the final result is just in or just out doesn’t make that much of a difference.

          With 50% you specify an interval that even if your model is correct you expect to miss half of the time. OK, you can interpret it as “borderline more likely than not” but as a scientist I wouldn’t be very comfortable with basing decisions on a 50% interval.

        • Indeed, what is the rationale for any particular confidence interval. Since the interval is not 100%, it really does not matter how the actual result compares, except to say something about how unusual that actual results is, or is not. What I mean is that using an interval to judge the forecast is a useless exercise – it would require that the actual result fall within a particular interval or at a particular place in the interval – either way, it undermines the whole idea of the interval.

          When the model is constructed, the actual results of the election is uncertain. I don’t think the point of a model is to predict the winner – that would require establishing a particular threshold for the prediction, sort of like choosing a threshold p value. Instead, the model produces a measure of the uncertainty in the election, and allows you to say something about how (un)usual the actual result was. I think evaluating a model in terms of whether it gave a “correct” prediction is counterproductive, akin to choosing a p value to determine “truth.”

  13. Did any model/modeler claim something like, “Biden would win this state but Trump would therefore sue in court to throw out x% of the votes such that the winner is…”?

  14. I would love to hear Andrew respond to Nassim Taleb’s criticism of forecasts, i.e., that if the forecast is allowed to vary with subsequent information then the only measure of the probability is 50%. Should forecasting just be scrapped all together? It seems like whatever number these forecasts are producing, it isn’t a probability.

    • I want a number I can use to determine a fair bet. I’m happy calling that the ‘probability’ but if you want to call it the ‘fair bet percentage’ or whatever, that’s fine with me.

      • But, I think that Taleb’s criticism is that you can bet with these forecast numbers because of the constant updating. If I took Bidden at 90 and Trump at 10, I could just wait until Trump goes up to 11 and sell, and then hold onto Biden until he goes back up to 90 and then sell. In this way I can make unlimited amounts of money, if these were real betting odds. So, something is wrong.

        • If Biden is at 90 there’s no guarantee Trump will go up to 11, Biden could go from 90 to 95 and then 100. Ditto for any other switch.

          There are transaction costs so you can’t make money on miniscule variation, you need an actual shift.

          I know someone who bought Biden shares on Predictit on Tuesday. Biden had started the day at, I dunno, 70 or something, but after the Florida results came in and some other states had heavy Trump votes too, Biden dropped to something like 30. My friend bought a bunch, thinking that the market had overreacted to the news: Yes, Biden was clearly doing much worse than the central estimate of the models etc., but should still win.

          So, yeah, you could have made money on Tuesday by first buying Trump, then selling Trump and buying Biden. But if Florida had been a Biden win that opportunity wouldn’t have been there.

        • To be precise, the argument is not that the probabilities should not move with new information at all, but that expectations of future information should shrink probabilities towards 50%, and forecasts should have minimal movement as they converge towards a final value at the time of realization.

          The vagary is WHO is being critiqued and why. The rhetoric seems to point towards the periodicity in Silver’s 2016 election, but I can’t find any specific information from Taleb or Madeka as to why those swings were predictable or systemic to the forecasting method.

        • I can’t find any specific information from Taleb or Madeka as to why those swings were predictable *through information publicly available to Silver* before we go down the philosophically Bayesian rabbit hold again

        • I could be wrong, but it seems pretty clear that what I’m saying is the essence of Taleb’s argument, especially based on figure 3 in this article

          https://arxiv.org/pdf/1703.06351.pdf

          I’m also not clear where I’m contradicting your article

          This

          > Big events can still lead to big changes in the forecast: for example, a series of polls with Biden or Trump doing much better than before will translate into an inference that public opinion has shifted in that candidate’s favor. The point of
          the martingale property is not that this cannot happen, but that the possibility of such shifts should be anticipated in the model, to an amount corresponding to their prior probability. If large opinion shifts are allowed with high probability, then there should be a correspondingly wide uncertainty in the vote share forecast a few months before the election, which in turn will lead to win probabilities closer to 50%. Economists have pointed out how the martingale property of a Bayesian belief stream means that movement in beliefs should on average correspond to uncertainty reduction, and that violations of this principle indicate irrational processing

          seems to be essentially what I’m saying. Am I misunderstanding something?

        • I think in a time-series there’s always a source of additional uncertainty… You get uncertainty reduction when you’re learning information about a static quantity… But if you’re learning information about the weather a week from now, your uncertainty could easily widen through time.

        • Somebody:

          OK, I see. On our article when I wrote “lead to win probabilities closer to 50%,” I meant “closer to 50% than you’d get if your model didn’t allow large opinion shifts.” In the discussion above, when you said “shrink probabilities towards 50%,” I thought you were saying the probabilities should go to 50% over time during the life of the forecast, and that would not make sense.

        • I don’t think this contradicts anything anyone has said, but I just want to point out: whatever the probabilities were a few months before Election Day, they will eventually converge to 100% for one candidate and 0% for the other, with no uncertainty whatsoever.

          Also, although collecting more information should lead to lower uncertainty in expectation, in any single instance that doesn’t have to be the case: more information can make you less certain.

        • Perhaps in our new post-truth society, the probability should eventually converge to the probability that a random person believes their candidate actually won the election.

        • > I could just wait until Trump goes up to 11 and sell, and then hold onto Biden until he goes back up to 90 and then sell. In this way I can make unlimited amounts of money

          You can if your information that Trump *will go to 11* turns out to be true… For this to be *arbitrage* you must have that there’s a 100.000000000000% chance that Trump necessarily goes to 11. In other words, you should be able to prove this mathematically.

          So, Taleb’s claim is either 1) False if he really claims arbitrage, because it’s obviously not possible to prove that Trump must necessarily go to 11… or 2) just the ordinary claim that someone with better information can make money off someone with worse information… which is unremarkable.

        • Daniel, I really think Taleb is pulling a Motte and Bailey. He starts and ends with the intuition that a lot of uncertainty should lead binary forecasts to 50-50. He then tried to formalize an option pricing argument with a specious distraction of this martingale silliness. Taht was just because he eyeballed Silvers timeline and thought it had too much volatility to be rigorous probability or something. This time around, most of the volatility was gone yet they’re still declaring victory for their ‘martingale approach’. The bottom line is you are correct – it all just amounts to an agent with better information can make more money, and that all the forecasts probably have extra uncertainty they’re not formally accounting for, so we would be wise to discount them some amount back to 50-50.

        • Taleb is claiming “arbitrage” in the sense of being equivalent to definition 1 of a Dutch book in https://www.stat.berkeley.edu/~census/dutchdef.pdf

          or

          > Dutch book can be made against the estimating probability q(•|x) if there is a gambling system that provides a uniformly positive expected payoff to the gambler

          so it’s not profitable with probability 1 but rather a betting strategy with expected positive payoff regardless of the final outcome (Trump or Biden) in an idealized case without transaction costs. The expected positivity comes from a perceived mean reverting property of Silver’s election forecasts, which is in this case public information rather than “insider trading”. It’s not obvious to me that the big swings in the 2016 election are an inherent property of Silver’s forecasting methodology rather than specific unusual conditions since I don’t think it’s something that happened in all the other elections he did.

        • Side note, I don’t think this is a typical definition of arbitrage, though I’m not a trader. I was able to find out his definition by following the citation in his paper

          https://arxiv.org/pdf/1703.06351.pdf

          > Further, option values or binary bets, need to satisfy a no Dutch Book argument (the De Finetti form of no-arbitrage) (see [4])

          I feel like a lot of time has been wasted on terminology at this point

        • @Zhou Fang

          Since we’re following Taleb’s argument, we’re using his terminology, in which he uses “volatility” to mean “future unrealized volatility” or “volatility in the remaining time to expiration.” So his statement is, as you say, only about a fixed state of the election and I think deliberately so. That may seem like a vacuous case, but I think it’s because he’s refuting a hypothetical person who thinks that high volatility justifies large and frequent movements in probabilities. He states this in the intro

          > We observe stark errors among political scientists and forecasters, for instance with 1) assessors giving the candidate D. Trump between 0.1% and 3% chances of success , 2) jumps in the revisions of forecasts from 48% to 15%, both made while invoking uncertainty.

          both of whom would be wrong, if they existed. A statement like “the forecast is at 10% for Trump right now, but remember that Trump is an unpredictable candidate and Hilary is under investigation so that could change drastically” *is wrong*, and anticipating the investigation to have a large and unpredictable effect on public opinion *should nudge your forecast closer to 50%*. But by not stating who he’s talking about, or including specific quotations, or describing an incorrect methodology for contrast makes it really hard to follow what he’s actually trying to say.

          For the record, I do think people who think like the above exist because I know personally some people who have said those things. Not that any of them would actually read this paper, or like Taleb makes an argument that’d be plausibly convincing to them. I do get the impression that he prefers that people continue to be wrong so he can keep insulting them.

          I have no clue what’s going on with that Mathematica graph, or what it’s trying to tell me.

        • @somebody :
          >A statement like “the forecast is at 10% for Trump right now, but remember that Trump is an unpredictable candidate and Hilary is under investigation so that could change drastically” *is wrong*, and anticipating the investigation to have a large and unpredictable effect on public opinion *should nudge your forecast closer to 50%*.

          Enh, this is perhaps pedantic but I think there’s justifiable reasons to make statements like this. What is going on in these situations is that the modeled volatility does not fully represent the modeller’s personal estimate of the volatility.

          For instance, the volatility estimate might come from an aggregation of multiple elections in the past. Then, what happens when you are confronted with an election with (apparently) substantially higher or lower volatility? Well, you can throw out all the historical data, but that’s a Lot of your sample size and you could find yourself chasing a false signal into the weeds. You can go in with a plug in value of the volatility that represents your own personal posterior assessment, but that’ll attract the ire of people with very different priors. And so on.

          So you can see that you can easily come to the situation where you end up with a model you don’t 100% believe in, and end up making statements like that. Heck, Andrew did the same in 2020.

        • yes, in 2020 Silver’s forecasts trended higher and higher in probability(Biden wins), rather than mean-reverting, so the positive expected payoff is dubious and there is definitely no Dutch Book here. Anyhow, I agree that terminology has been massively distracting :)

        • Note that in the 2020 forecast Nathaniel has done some new things

          https://fivethirtyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-different-because-of-covid-19/

          most notably

          * The model is now more careful around major events such as presidential debates that can have an outsize impact on candidates’ polling averages. If candidates gain ground in the polls following these events, they will have to sustain that movement for a week or two to get full credit for it

          * The first type of error, national drift, is probably the most important one as of the launch — that is, the biggest reason Biden might not win despite currently enjoying a fairly wide lead in the polls is that the race could change between now and November.

          National drift is calculated as follows:

          Constant x (Days Until Election)^⅓ x Uncertainty Index

          That is, it is a function of the cube root of the number of days until the election11 times the FiveThirtyEight Uncertainty Index, which I’ll describe in a moment. (Note that the use of the cube root implies that polls do not become more accurate at a linear rate, but rather that there is a sharp increase in accuracy toward the end of an election. Put another way, August is still early as far as polling goes.)

          The uncertainty index is a new feature this year, although it reflects a number of things we did previously, such as accounting for the number of undecided voters.

          Both of which strongly suggest that Taleb was right and he was previously underadjusted for future uncertainty. Nonetheless, he has conceded nothing publicly on the twitter shit flinging. Taleb, on the other hand, has claimed he wasn’t talking about 538 specifically, despite 538 predictions being in their paper as a negative example, possibly to hedge against any counterarguments that may be made about a specific case.

          Ego has ruined this entire discussion, and I’ve learned much less than I otherwise could have.

        • I didn’t quite follow all the technical details since I’m doing about 6 things at once, including helping someone install a thermostat from hundreds of miles away…

          From what I can see in the Dutch book argument, the “polls” are like the x which is observed, and the theta is the “underlying preference” which is only observed after the full election count…

          So he’s claiming that there’s some system of bets you can place so that no matter what we see in the polls, on election day you *will* make money, unless there’s Bayesian updating of the model.

          I don’t think that’s controversial. This is more or less accounting, the Bayesian system makes it so that you don’t create contradictions, and the contradictions are what let you create a Dutch book that leads to guaranteed money on election day.

          What I don’t follow, is he seems to think that this should lead to prices that stay stable at 0.5 right up until a couple days before election and then converges rapidly to the actual outcome over a few hours or a day or whatever.

          There are assumptions you can make which lead to that behavior. What I don’t see is that those assumptions are *necessary* in order to avoid being Dutch Booked.

        • The general claim is that future uncertainty pushes probabilities towards 0.5, so even if candidate A is polling at +5 100 days out AND margin of error in polling puts their share of the vote above 50% (ignoring electoral college) with probability 0.95, the probability candidate A will win is lower than 0.95 because things can change. Or, to put it another way, even if we know for certain that A would win if we held an election on day t=0, the probability that A will actually win is less than 1. The more uncertainty, the closer to 0.5. All this talk about arbitrage is just providing a decision theoretic justification for this, which is provably equivalent to bayesian reasoning. If you are willing to accept bayesian inference without Savage style philosophical arguments, you can ignore them imo.

          Most of the paper is just providing an example of the above for a case that quants have figured out how to price, in this case a brownian motion mapped to [0,1] by a logistic link function which pays out if it terminates above some threshold and pays nothing if it doesn’t. The probability is computed as a scaling factor on the price/wager such that expected payout is 0. The graph is an illustration of what would happen if you modeled the polls as intermediate realizations of said logistic Brownian motion and evaluated the probabilities as above. The volatility is an input to this model, and as you crank it up, probabilities in his model go to 0.5 as expected.

          The twitter fight claim is that Nate is in effect setting this volatility to a level lower than than he should based on publicly available information about polls and public opinion. As best I can tell, the evidence is “looking at it, it swings too much.” Maybe he did the regular quantitative finance thing and used the standard deviation of the intermediate poll values from previous elections, then compared the timeseries to Nate? And that’s what sets the probabilities in his graph to nearly 0.5? I find that dubious since the polls for previous presidential elections just don’t see that volatile. I could see an argument that the poll bumps near debates do constitute something that can be “arbitraged” by Taleb’s definition. But how Taleb is getting the “correct” volatility for his model is beyond me, and in my opinion vaguely specified.

        • Somebody:

          I think Nate’s incentives changed. Before 2016, his key incentive was to provide a continuing flow of news, hence it was to his benefit to have his probabilities jump around. After 2016, his key incentive was to keep his probabilities closer to 50% to avoid a 2016-style embarrassment. As we’ve discussed, I think that’s why he ended up with unrealistically wide state-level predictive intervals: he threw in a bunch of error terms and accidentally introduced some low and negative correlations, which required him to have huge state-level uncertainties (e.g., giving Biden a 6% chance of winning South Dakota) in order to get a reasonable national uncertainty. On the plus side, his focus on the national error gave him a reasonable electoral college forecast, and that was the main thing. Also, super-wide state intervals don’t really hurt your reputation because you can still say you got 48 out of 50 states correctly or whatever.

        • I’m not sure that’s what he claims. That “perceived mean reverting property” doesn’t give you “a gambling system that provides a uniformly positive expected payoff to the gambler”. You may “buy low” but there are potential scenarios where you never get the opportunity to “sell high” later: the payoff is not uniformly positive.

        • The dutch book in the construction is positive expected payoff over all realizations of x, conditional on all possibilities for theta, not positive realized profit with probability 1. The confusing I think arises from the fact that De Finetti’s example is a two stage decision process; there’s a latent parameter theta whose value is being bet on and an observed data value x, whose distribution is conditional on theta, which the gambler conditions on. So

          1. The bookie has some prior on the hidden theta
          2. Some value x is drawn from a distribution conditional on theta
          3. The gambler places any number of bets b_i(x) that theta is in some set C_i
          4. The bookie collects the money wagered, scaling each bet by q(x, C_i)
          5. theta is revealed, and based on its value the bookie pays off the sum of the bets placed in 3

          If a system of bets b_i, c_i exist such that the expected payoff, conditional on theta, integrated over p(x | theta) is strictly positive for all theta, that’s a dutch book. If the bookie sets q to be the conditional probability that theta is in C_i, given the prior and the conditional distribution of x, there are no dutch books and all such expectations integrate to zero.

        • De Finetti’s Dutch book argument is about taking advantage from an “incoherent” bookie to place a set of bets (which form the Dutch book) such that we will win no matter what. We know at the time the bets are placed that we will get a positive realized profit with probability 1.

          The second chapter in that paper is a “extension to statitical inference”. If I understand it corretly, if the bookie is not Bayesian and gives us the opportunity to place bets about the unkown parameter based on every inference (i.e. for every observed x) we can take advantage of the inconsistencies to create a Dutch book so we get a positive realized profit whatever the true value of the unknown parameter. If we have only the opportunitity to place bets for some observed x we can build only part of the book and only the expected value (over the whole book, i.e. over x) is positive whatever the true value of the unkown parameter.

          I may be missing something, but I don’t really see the relevance of this “extension to statistical inference” in the context of the problem under discussion. If it’s not about a gambling strategy that guarantees a gain whatever the outcome but only in expectation, it’s an expectation over what precisely? At time t we can place a single bet, how is the book constructed?

        • As for how to get the book, I don’t know, since the paper doesn’t provide a constructive proof but proceeds by contradiction to show that the payoff for an incoherent bookie can be anything.

          > it’s an expectation over what precisely?

          Over the distribution of x conditional on theta. If this conditional expectation is strictly positive over all theta, then the total expectation is also positive *no matter what the bookie’s prior over parameter space*. This is kind of weird, since x is realized when the bet is placed, but remember that the dutch book isn’t a particular bet but a system of making bets based on x, or a predetermined mapping from revealed information x to wagers made decided on before the game is begun.

        • > Over the distribution of x conditional on theta.

          What theta? What I don’t see is the relevance of that paper to the problem where a forecaster offers you at time t the oportunity to place a bet for or against the occurrence of some future event with probability p(t).

          In that paper, “A ‘finite estimation problem’ consists of (i) a finite set X , and (ii) a finite set of parametric models {p(•|θ) : θ ∈ THETA 􏰁} specifying probability distributions on X. An ‘estimating probability’ q(•|x) is a probability on 􏰁 for each x ∈ X. Consider subsets C1, . . . , Ck of THETA􏰁. After x is observed, allow the gambler to pay bi(x)q(Ci|x) in order to get bi(x) dollars if θ ∈ Ci.”

          We don’t have any of that here as far as I can see.

          > the dutch book isn’t a particular bet but a system of making bets

          Just to be clear on the terminology, the usual meaning of Dutch book is, according to the wikipedia, “In gambling, a Dutch book or lock is a set of odds and bets which guarantees a profit, regardless of the outcome of the gamble.”

          The introduction of the standard Dutch book concept in that paper is similar “Unless the odds are computed from a prior probability, dutch book can be made: for some system of bets, the clever gambler wins a dollar or more, no matter what the outcome may be.”

          In the extension to inference they give this definition “Dutch book can be made against the estimating probability q(•|x) if there is a gambling system …”. The gambling system is what you use to make a (non-standard) Dutch book.

        • For the sake of argument, let’s assume that there’s somebody forecasting the election outcomes using polls, where their only source of uncertainty is polling error. That is, the probability that the true number of people intending to vote for candidate A > 0.5 is always reported as their forecast. I know that debates produce a bump in the polls, but also that the bump is usually temporary. If I wait until a debate where candidate A wins, then bet against candidate A, then sell that bet back to the forecaster 2 weeks after, I have a strategy that produces a profit in expectation regardless of whether candidate A or candidate B ultimately wins. In the Dutch book version, the key is that you don’t make just one bet, you can make many bets and you get the sum of all of them. In this, it’s intertemporal “arbitrage”.

        • > For the sake of argument, let’s assume that there’s somebody forecasting the election outcomes using polls, where their only source of uncertainty is polling error.

          I think we all agree that this is not a good forecasting method. But, for the sake of the argument, let’s assume it.

          > I know that debates produce a bump in the polls, but also that the bump is usually temporary.

          Knowing things definitely helps! I think we all agree that if you have use a better model than your counterparty you can make money (in expectation).

          > If I wait until a debate where candidate A wins, then bet against candidate A, then sell that bet back to the forecaster 2 weeks after, I have a strategy that produces a profit in expectation regardless of whether candidate A or candidate B ultimately wins.

          It will produce a profit in expectation… according to your model (as I said having a correct model of the world helps). And in general it will not produce a guaranteed profit (which is what the standard definitions of Dutch book and arbitrage are about).

          > In the Dutch book version, the key is that you don’t make just one bet, you can make many bets and you get the sum of all of them. In this, it’s intertemporal “arbitrage”.

          Dutch book arguments are about exploiting inconsistencies in the probability model of your counterparty to (quoting Taleb) get a betting “edge” in all outcomes without loss. Your argument seems to be about knowing that they have a bad model (and having a better model) to get an advantage in expectation.

        • By the way, I think we also agree that the probabilistic model used for forecasting should be consistent and a sequence of forecasts should be a martingale. But there is no way to say for sure that any sequence of forecasts in (0 1) is not a martingale, it could really be the perfect forecast obtained with the true model. And how could inconsistencies be exploited if they can only be detected (statistically) ex-post? You need additional assumptions to postulate that the exploits that would have worked in the past will work (in expectation, at least) in the future.

          Of course if you have your own forecasts which you know are good (or at least better) you don’t have to find a way to exploit inconsistencies, should they exist. You directly profit from your superior knowledge (and it’s irrelevant whether the model used by your counterparty is consistent and the forecasts a martingale under that model).

        • > And in general it will not produce a guaranteed profit (which is what the standard definitions of Dutch book and arbitrage are about).

          I agree that the definitions of Dutch book and arbitrage here do not accord with the standard ones, but it is what it is. What we want to do is avoid admitting the possibility of an expected-profitable system of betting.

          > Your argument seems to be about knowing that they have a bad model (and having a better model) to get an advantage in expectation.

          It’s more that I know their model isn’t a coherent model of the election outcome at all, but rather a model of current public opinion. They aren’t reporting the probability that candidate A wins the election, but rather that candidate A would win under a hypothetical election today, and I wouldn’t expect that reported probability to update with coherent martingality like a proper bayesian process, because it isn’t. Hence Taleb’s claim that “real probabilities don’t move like that.” Nathaniel definitely already knew that current average + polling error wasn’t the true election outcome probability, hence the “nowcast” feature. He was already using a cubic time-to-election uncertainty term to heuristically pull probabilities towards 0.5. Was it not enough, or did it behave weirdly? I don’t know.

        • > But there is no way to say for sure that any sequence of forecasts in (0 1) is not a martingale, it could really be the perfect forecast obtained with the true model.

          I agree. A true model with correct updating can still produce a sinusoid or any other shape with non-zero probability. But it can certainly be suspicious, and very large periodic swings should be very unlikely.

        • > I know their model isn’t a coherent model of the election outcome

          But you don’t know that.

          The following model is coherent: “public opinion is very stable, each day stays the same with probability 99.99999% and moves with probability 0.000001% with the change given by some probability distribution.”

          It’s not a very bad model but it’s not incoherent. The optimal forecast with that model is to align with the polls.

          > What we want to do is avoid admitting the possibility of an expected-profitable system of betting.

          This has nothing to do with Dutch books or intertemporal arbitrage. If some bookie offer bets for binary events always with even odds nobody would be able to make a Dutch book against them, intertemporal or not. But it would create expected-profitable systems of betting if you know anything at all about the probability of those events.

        • I feel like we’ve gotten away from the original point, which is what is Taleb trying to say.

          Taleb claims to be responding to the idea that high levels of future uncertainty mean the probability *should* jump around a lot, citing

          > jumps in the revisions of forecasts from 48% to 15%…made while invoking uncertainty.

          So I believe he’s responding specifically to some hypothetical person who is identifying uncertainty in future public opinion with the probability that some candidate wins the election. Identifying probabilities by intermediate polls + measurement error *would be incoherent*, and Taleb believed people were doing that. His paper shows that under a simple model mapping volatile public opinion to election outcomes, coherently updated probabilities go to stability at 0.5 as volatility goes to infinity, rather than causing each intermediate probability to be sampled uniformly from [0, 1]. He also strongly implies that Nate Silver is one of these hypothetical people who don’t understand the distinction.

          It’s certainly possible to have a bad model and lose money, or have a coherent overconfident model that nonetheless has a low likelihood on historical data, and I agree that it’s impossible in general distinguish between a bad model and incoherence just by looking at forecast timeseries.

        • somebody, I don’t think anyone disagrees that as uncertainty/volatility goes to infinity, a forecast of a binary stabilizes exactly at 0.5, and also that more time remaining until the forecasted event means more things can happen hence greater uncertainty. Taleb’s whole project here starts and ends with those observations, and takes a fairly long, confusing and mostly irrelevant detour through “intertemporal arbitrage”, martingality, and his option pricing model (that has yet to be applied in a case study or real world application AFAICT). After a long conversation with Aubrey Clayton and Dhruv Madeka, all participants agreed you cannot look at a time series of forecasts and determine *is this a martingale?* Aubrey even provided a super volatile simulated forecast series from one of Taleb’s option pricing models to prove the point. So, basically, this all comes down to the idea that with better information/model you can make money by making clever bets, but it is not guaranteed in an arbitrage sense in most any real world context, and you can only Dutch Book an incoherent bookie. Nothing much has been gained here :)

        • > somebody, I don’t think anyone disagrees that as uncertainty/volatility goes to infinity, a forecast of a binary stabilizes exactly at 0.5,

          I disagree. It stabilises at exactly 0.5 *for a fixed ‘state’ of the election*. However if we are making predictions *during* such an election cycle, high volatility implies also more extreme states of the electoral process. The overall effect is that theoretically there *is* no impact from volatility on the genuine win probabilities.

          You can confirm this by a simple simulation model in R.

          https://nopaste.ml/#XQAAAQAbAQAAAAAAAAA7G8nNSO4SwArVAwLRKaQ3NAGRMYOb48NHbW0VPPqfRUxiuEaRj9+UJWt9JFllpRJ/9PQhNKOzbItvJt3Qq+vHPpbtvgOACoNGsT6lHncufR9T4G1za0vUFXYHCE/TzzUByYFbYiqrnQ1i+0IimoNzMVPGNnv58WRuI0WWOSnFtYZV1a/Nb8NgrDvUpkOG9FsaosJa3ZIDZoOSLAlSbrJID61rruxtA89IK97K/8rFv1c=

          Taleb is flat out wrong.

        • >> I don’t think anyone disagrees that as uncertainty/volatility goes to infinity, a forecast of a binary stabilizes exactly at 0.5,

          > I disagree. It stabilises at exactly 0.5 *for a fixed ‘state’ of the election*.

          What does that mean?

          Anyway, I think that the volatility under discussion is the volatility of vote_share(t) and the binary event to forecast is vote_share(T)>0.5

          I don’t see how your simulation relates to that.

        • > What does that mean?

          What I mean is that if Biden is 5% up on Trump in the polls at a given moment in time, then under high volatility it’s true that a well calibrated win-probability forecast will give a number closer to 0.5 than otherwise.

          OTOH, if the volatility is high, Biden is more likely to be 5, 10, 15% up (or behind) on Trump at any moment in time. This variability in the state of the election cancels out the variability in the future-event forecast, which means that just by looking at a graph of projected win-probabilities over time, that graph is not (to some degree of approximation) actually affected by the degree of variability.

          >
          > Anyway, I think that the volatility under discussion is the volatility of vote_share(t) and the binary event to forecast is vote_share(T)>0.5
          >
          > I don’t see how your simulation relates to that.

          What I’m simulating is the difference in votes between Biden and Trump, i.e. (vote_share-0.5)*some large constant.

          Regard the state of the election (interpreted as the vote shares if the election happens immediately) as this value. The simulation simulates this as the cumulative sum of random gaussian daily changes in voting intention (that’s the dailyjumps), which have the specified degree of volatility.

          Winprob runs the simulation forward from each point in time to see what proportion of future instances does the sum exceed 0.

          You can transform this simulation into a vote_share > 0.5 formulation quite easily. Do e.g. a logit transform.

        • Zhou, I’d have to think about this some more, but here’s what I think you are perhaps missing. The relevant forecast is NOT *what would happen if election held today?*, but *what will happen on singular election day?* ahead. For the latter, high volatility spreads out where things “land” on that future date, and the amount of uncertainty should increase proportional to time remaining.

        • So I’m sticking with my assessment above and elsewhere :) High uncertainty –> 0.5, and time remaining should increase uncertainty. However, there is not way to judge a series of forecast probabilities and determine whether it is a martingale (an ill-posed question IMO), or whether intertemporal arbitrage is possible, etc. All that matters is whether they followed the Bayesian mechanics correctly. If you have a better model, or better data, or both, you can expect to make money betting on final outcome. Additionally, if you knew, for instance, the inner guts of the model and had a better way to forecast the covariates they are conditioning on, you might be able to game them in a series of bets assuming they are willing to trade in and out of their positions with you, *based on your superior knowledge of the future data*. You might call this “arbitrage” but that would be a fast and loose sense of the term.

        • Chris Wilson wrote

          > Zhou, I’d have to think about this some more, but here’s what I think you are perhaps missing. The relevant forecast is NOT *what would happen if election held today?*, but *what will happen on singular election day?* ahead. For the latter, high volatility spreads out where things “land” on that future date, and the amount of uncertainty should increase proportional to time remaining.

          Zhou isn’t missing that, these are correctly simulated probabilities for the final outcome conditional on intermediate values of the process. The key distinction between this and Taleb’s example is that Taleb is talking about “future unrealized volatility”. Conditional on public opinion being frozen at a current level, if you take future volatility in public opinion to infinity probability goes to 0.5. But before looking at any public opinion, you expect higher volatility to also push your intermediate surveys to more extreme values, so the volatilities cancel out and the conditional probabilities you expect to see are independent of volatility. Or, to put it another way, volatility makes a signal of a given magnitude less informative about the final outcome, but also causes you to see stronger signals.

        • somebody, yea I see that. I think that’s a good way to explain it. However, I guess what I’m saying is the relevant forecasting task is the future unrealized volatility. In that specific sense, I am agreeing with Taleb. It should tamper your probabilities toward 0.5 by some amount, but I prefer explicit model-based ways of specifying this rather than bluster. My understanding is that both Gelman and Silver’s models have drift terms to accomplish this, but this obviously misses the possibility of sharper swings driven by e.g. feedback cycles where GOP voters are blasted with media and social media coverage of the scary Democrats and their crushing mail-in vote advantage and are energized to turnout on election day, etc.

        • > What I’m simulating is the difference in votes between Biden and Trump, i.e. (vote_share-0.5)*some large constant. […] The simulation simulates this as the cumulative sum of random gaussian daily changes in voting intention (that’s the dailyjumps), which have the specified degree of volatility.

          That makes sense only if that specified degree of volatility is low relative to that large constant (i.e. if the volatility of vote_share(t) that we’re talking about is low). Otherwise, there is a non-negilible possibility that the cumulative sum of random gaussian daily changes is big enough to reach vote_shares below 0% or over 100%.

          In summary, that simulation doesn’t tell us anything about the case under discussion where the volatility of the vote share is high.

        • Carlos: That’s just an artifact of the random walk specification. It’s a sufficient rebuttal because it’s the same example Taleb uses – I think Taleb says that this is the only situation you need to look at, and I’m inclined to agree. But if you want to go more generally you would have to have a different definition of “volatility”. You wouldn’t be able to define volatility in terms of a fixed constant number, because as you identify, if the vote share is close to 0% or 100%, the volatility necessarily reduces. I think using the logit transform solves the problem though.

          Chris:
          >Zhou, I’d have to think about this some more, but here’s what I think you are perhaps missing. The relevant forecast is NOT *what would happen if election held today?*, but *what will happen on singular election day?* ahead. For the latter, high volatility spreads out where things “land” on that future date, and the amount of uncertainty should increase proportional to time remaining.

          No, I think you misunderstand. The simulation code stores “what would happen if the election is held today” as the electoral state. It then generates and aggregates what will happen on a singular day ahead by monte-carlo, producing the win probability. That’s the sum(rnorm(electioncycle – i)*volatility) term that computes the remaining volatility in the days that remain.

        • Zhou, others

          Here’s a simulation of a stochastic process on the logit scale and then we get probabilities with the usual inverse logit function. There are two probabilities here, one is the “day of” election forecast, and the other discounts based on future unrealized probability. The latter behaves as Taleb suggests, and as I agreed above – stronger pooling towards 0.5 with increased volatility or time to forecasted event. I believe this same mechanism is present in Andrew and Nate’s models with a drift term, although Andrew is obviously in best position to clarify that.

          I am open to corrections on where this simulation is going wrong (if it is)

          https://nopaste.ml/#XQAAAQDJAwAAAAAAAAA2GkhrVD9V4ki221mbXikQUVNvcVgJ1N1M01ZFEtsieVgGJE7FqcK4oZZG8eO65U15BuBC5ttRWeYUJhGNV6M1Mq9Cf6oVLfLFkryaN1lUqpnldW6Rz0nC/xIINuQrJQhZEOX/faz98mKn4VYtQphRNKu3ivaMnQ2c5/TJFPjlApPQCKFokEF4x+nvLCrIIEm6wY/tS/yv2cmcoiJIw2nIOPE273WyqkQEgLAzo5eUWEQbnZ/bCck9hegO5+7r1SEjDCFkcG/58HABjj0HEBkOmJ/GS+tf4IU0useGWxHMJa+ag6ynlaZe/HdfnlxF5NvEspZ037VeULKasSnyXv2cXAlZgP7O9/TDSkIDbiuN0EhnAhFJKcWUdEckkFrW0lhmKWqKuFbhVJZemVtdDZtFvdVlf5nloCAPeVL5UinrSFXBDJyNUSQWJ/e9K+g2aErFGqhBPVG7SjtaptKO6v2xxNLwL0rRAHP3P7W0sGaStuu3bP2nlyyrwA8ZMLSjWN9bQq+e533e18RBZVtTx6eXctWVIM5nI+NcaZM1mPlMRLo=

        • Chris:

          There’s an error in your code:

          logit_support[i] + sum(rnorm(1)*(electioncycle-i)*volatility))

          is incorrect.

          I mean the sum basically does nothing, while the “electioncycle-i” needs to be square rooted.

          logit_support[i] + sum(rnorm(electioncycle-i))*volatility

          will be basically exactly the same as my code.

          The second problem with your code is that you aren’t calculating win probability, you are modelling vote share. This is because your model directly uses the inv_logit(logit_support) as win probability, instead of checking whether this is > 1.

          I mean, the point of the volatility in this discussion is that we are talking about how volatility in polling should feed through to the win probability, so in your formulation you’ve actually forced the win probability to be more volatile as a function of volatility. As a result you’re basically saying “if the win probability is more volatile, it’s more volatile”.

          I consider this a fixed version:

          Try changing the volatility for a fixed RNG seed. The vote share curves change but the win probabilities not at all.

          https://nopaste.ml/#XQAAAQCeBQAAAAAAAAA2GkhrVD9V4ki221mbXikQUVNvcVgJ1N1M01ZFEuQpsSVmMcdq0at5rWMv7O2YR/sL/+YsDQOMpYUyx929ahz72Y2k1HAb9qyRT4Fqam60b+MVeL03+wOKN74RSyYWLaFJ+3tHIYebJCygPhScLQ2qoscM69dsh9+sEm/AkG6vvdIQQtoJ5fUTtcagvDWlXY9+Kogzvp2emF6adNo0exIdFxFnIz/AwZ3xhvOHdDojQ2KXqmpjSdhY1rNuIrz78wM5SluXI8YvU4NkPeAz6W72bJnrfZ4xWzw6Ee7k63PCkpnhEXft+LxxuSxokiAPbK/I7/sSty+nmvBDC+xsnbWqjei2WgNvRImScz3T6WE/06G7ow/uEpS+w7eD1Ak9G8HNmakKT7H4LZklXki2R6uspJPGbqoBmvy9+WHkfRQNEQoHzoxJ+l2Gd6CQNc9Gg93hRsMfxVFUBqCvLwK8jZTYCoq+ryh8wYAQKqri/+VBdlCX1QAuXrmC6wadV0m0APULOO2+cYaVUroQ5Y2y8PQPkTa0qhiL8nEC9HBl2A97JJ61RlThXbOzqftSiS6XztIqPTg5l14hOb1E2Y+dsg+b8at5eA/T35c0qlvg7bEyNxzEhKL4XRxXrXWVasANIY8SfIvaTNUi/ltFODBwt6MbrQythodaxEChWGsKqjDu2ONFD5A/jsccPf9/6p532jMiWa9MsEhvBdeTySAMtJjfOS4zvjLR6OJZ3oidf6tW/QhO2w==

        • Hi Zhou,
          Aha thanks for this correction #logit_support[i] + sum(rnorm(electioncycle-i))*volatility#

          The perils of late night coding after an exhausting week :)
          I will look through more carefully later, but this does seem to change things…

        • Zhou, another quick thought. With your fix to that line of code – so that it now does what I was trying to do lol – I think we can highlight the subtle distinction you are drawing with vote shares.

          My approach is to suppose there is a stochastic process for delivering win probabilities. So this would represent movement in underlying vote intention AND uncertainty in that quantity. So yes, “if the win probability is more volatile, it’s more volatile”. I’m imagining a sequence of probabilistic predictions for vote share, each of which has a posterior distribution, and hence the ability to define the probability of exceeding 0.5.

          This is what my logit_support process is doing basically. It is saying, use the inverse_logit to generate a probability for the result *if election were held on that day*. What does the future volatility in this quantity mean for how to update win probabilities?

          This is subtly different than using the logit transform as you had originally suggested, simply as a way to constrain vote share without uncertainty. So you are totally correct there.

          And I believe this difference explains what Taleb is trying to do, somehow. Whether it is the most useful way of looking at the problem is totally up for debate.

          Check out corrected version:
          https://nopaste.ml/#XQAAAQBcBAAAAAAAAAAR4HyGaDdR2Hft5ORcOcC2pFjh8eJOU7NpIdXE4FTeUeXSll0UqWOLK4gY6cvyz2N70DNj/JXV1sCSBptciyXwHL8Jk159hj8H58/DSVD940SZ/xwFRSVRWzZ0jWJiRluAe1upqCslNR61syBoaWRS2g3FmnVtPru7pU17a06bqHOkNYbhjwblBGC8s6wRcb5mjlbxa96UnqPFl+enMrJIXIisbAs2HZYeYDLwe1CznwbLqTijGdst/H0keYOo6esCcuKFIShdO5Yd3tfEZO4uVRBRHVFrhnnJJIC5deEd+LuvTzyG2QhFB1zug0MXThZIcYNCGJHZ9v9jK+pd+YiuNq4TzvfNps1T/NIihVqA6LP7mWl6W8QsnvnnC8HHVUn0nq71QIlg5Srnp4DmMscPUExPsgd6xO0rbazUZDzTZtHrsFgOhHQ8MpE1k9jW2DHBVLn8BSowvEz1ZwEw3t6J+VktT65UHo12IxYYVW3WDwbklBh0i7ZWwX1Kb8NDX9nRXaiqyItZ+pC75AMpusWuMI6qpfdSfHs+t/dwsrTYXCJoclfUIMXQ7CA/xDBXhZMVQriXde6UVqj88sVVLEylffrKKgbcIxS/GxMQxs1wlG14Pciinzkee9UbEDSx3baBjx2rmMuO/WFK/LTp2w==

        • Hi Zhou, I think the blog ate my comment so re-posting. I think you are highlighting a crucial difference in what we are modeling here. My stochastic model is representing a process for delivering win probabilities, so think of this as vote share plus uncertainty in that vote share. Imagine a sequence of posterior distributions over vote share, such that for each day you can compute the probability that vote share > 0.5 (= win). Volatility in this process subsumes both volatility in the underlying quantity as well as epistemic uncertainty. So I’m not using the logit just to constrain vote share as a deterministic quantity to (0,1).
          And I believe this is the key to what Taleb is trying to point out. What should the impact of any source of uncertainty be on our updating of win probabilities? Whether this is the right way of thinking about the problem I am not at all sure.

          With the correction you helpfully pointed out, here is my revised code:
          https://nopaste.ml/#XQAAAQBGBAAAAAAAAAAFHApi7DlcRCBRLDaveg9nJk9WIHlMQGTRIIvCFRfOq7Wtj9lPFNXoiZmIM8ooZ3NECtIwm0X68WKD0cT3pPu1ZVLSrC75uMkf5bqpRrhd7fOp2CFYh62nsblXXRoPE5S0PIuPUyMzoCn7Umr7aIImjexSRYF6JOFp+sAzA0nJayILre3zI5pGbDkGofogFIaMQLP3UMFaHJmDZprPG8pnjDzsKVng00jBdWuy8mzCa6NVinrbN0JXJtgdnTVQSt4F3g6UFB0n1/6M+VJu2nEor4FEAZaElH03FLuKsJDzXhYdOntATwkcqg3/cZrxdgdNArneSNqWElttSVk3zrbFDQ7ndlj7IEs0msTrda6TsibKsm+P21jP0zOWDq0wSQFVY/893Cy8zuM/SBdGPZqfZHfEpvnvyNhEm2OvGIDyKGmHLzbtyHGlRieQcbLG5CIjbTmrnfekaXxDsi28ur15hoVwUexQID6Oorq2qkD0uwqXSfrYfh8eAi4MF8XCTNjTwF3CRfyt6+nDXjhisTMM5wIwmF93FMxVdY3xtHlMteUxr5c7N4HNtOn+sJy5KoNcdF99VjfCjstWezvnDa5nzD4xdJWch8XF3iuPiang7h+2uPjk5lfZ/GRtUA==

        • > That’s just an artifact of the random walk specification. It’s a sufficient rebuttal because it’s the same example Taleb uses

          Your example may be relevant to Taleb’s example. I’ve never understood what Fig. 3 was supossed to show.

          But the discussion was about what happens when the volatility of the vote share is high. Using Taleb’s terminology:

          Y0 the observed estimated proportion of votes expressed in [0,1] at time t0.

          s annualized volatility of Y, or uncertainty attending outcomes for Y in the remaining time until expiration.

          Your simulation only makes sense if s*sqrt(T) is small (much less than 1).

          > I think using the logit transform solves the problem though.

          I see Chris Wilson has shared some code, I’ll look at it. My guess is that when a non-linear transformation is applied the generated winProb sequences are no longer independent of the parameter “volatility”. Changing the parameter in your simulation (keeping it low enough for the simulation to remain valid!) scales “dailyJumps” and leaves “winProb” unchanged. But your simulation is not applicable when the variance of the vote share is high.

          One suggestion about the simulation code: doing

          winProb = c(winProb, pnorm(-sum(dailyjumps), sd=volatility*sqrt(electioncycle-i), lower.tail=FALSE))

          instead of

          winProb = c(winProb, mean(sum(dailyjumps)- replicate(reruns,sum(rnorm(electioncycle – i)*volatility)) > 0))

          is much faster, generates a precise forecast and makes more clear what’s going on.

        • Chris, is the vote share included in your model (or can it be derived somehow)?

          I would expect the forecast to be 0 or 1 on the last day (when there is no uncertainty left), but it doesn’t seem to be the case.

        • Carlos, as of right now it is not explicit, at least not the way I’m interpreting it. Rather there’s a process that delivers win probabilities that exhibits volatility. I think we could specify a model for vote share and then add epistemic uncertainty – to mimic the real world situation with inference and forecasting- and get to the same endpoint. But I’m not convinced either way yet! What I am convinced is that this is very close to what Taleb is arguing we do somehow…

        • FWIW, I like the replicate() formulation over pnorm since it lets you drop in other specifications of the volatility if you wish and see if it makes any difference.

        • > Look at the nopaste I did at Chris more recently. That does the logit transform.

          Thanks. My first guess was wrong and it’s still the case that changing the “volatility” parameter leaves winProb unchanged. The volatility of voteShare increases (but not as much, it just gets compressed into the extremes). The residual variance at some point (the uncertainty) is not just a function of the time remaining, it also depends on the value of voteShare. Higher residual variance (uncertainty about the outcome) corresponds to winProb values closer to 50%. In any case I won’t say that Taleb is not wrong, I don’t know exactly what does he claim.

        • Zhou, Carlos
          If you look at Zhou’s code, I think we see the same dynamic that I am talking about, just in a different place. The final vote shares (panel c) systematically pools towards 0.5 compared to the vote share at any given point in time. As volatility increases, it does more relative pooling, along with greater time until election day.
          Excellent! I feel like this has been very productive actually. Thanks for sharing code Zhou.

  15. I am ultimately surprised that, through all the contextualising and discussion that’s been placed on the priors, the analysts and modellers who synthesised polling data did not seem to ask themselves these two questions before they established their priors:

    1) if you were an ardent Trump supporter & voter, how likely is it that you’d wilfully respond to a poll request if you feel it’s the kind of poll that gets covered by so-called mainstream media?

    2) If you were Democratic of any ilk, how likely would you respond to the same request?

    I cannot imagine that the average analyst looking at the political landscape of the United States over the last 4 years would not see the very likely ( in my view) expected answers to this question. Not all Trump voters may object to a “Deep State” poll. Not all Biden voters will respond to a poll, but the biggest shock to me this polling season was the lack of recognition of this possible error and its magnitude. Even if 10% of Trump voters (relative to Biden voters) would reject polling, it gets closer to explaining yesterday’s outcome.

    Therefore, the lightbulb-turning-on moment to me in the the last days was reading how adamant Nate Silver was about the lack of a “shy” Trump voter.

    The blindness to the very known disdain and skepticism of Trump’s base with respect to traditional political enterprise will, in my view, come to serve as the explanation for this year’s polling error. I don’t think this error is going anywhere if a Trump-like candidate becomes a fixture of the RNC in the long-term.

    • What on earth makes you think people didn’t think about these things?

      The “Shy voter” effect, the “Bradley effect”, differential non-response… these are extremely well known among pollsters and poll analysts; it’s my impress that differential nonresponse is the single thing they worry about the most! It’s certainly wrong to suggest they don’t think about these things.

      The problem isn’t that these questions don’t get asked, it’s that it’s hard to figure out what to do. Do you just say “I underestimated Trump’s support by 5 points last time, so I’m going to add 5 points this time”? You -can- do that — Wikipedia’s article about the Shy Tory effect suggests that’s what some pollsters did twenty years ago — but surely you’d rather find a more robust way to make adjustments. For one thing, what do you do next election when Trump isn’t running?

      I’m sure pollsters thought they were doing ok on this stuff. I can’t think of any way to know for sure until Election Day. Tough problem.

    • Adam –

      > The blindness to the very known disdain and skepticism of Trump’s base with respect to traditional political enterprise will, in my view, come to serve as the explanation for this year’s polling error.

      Do you have any actual evidence of a “shy Trump voter” effect, or are you just in-filling based on a correlation to determine causation with no actual evidence?

      Yes, Trump has outperformed the polls. Or maybe we could say that Biden has underperformed the polls. We could think of many reasons that has happened. A “shy Trump voter” effect is one of them. There is evidence, actually, that is problematic to the “shy Trump voter” theory.

      • Joshua:

        Forget talking about Trump outperforming the polls or Biden underperfoming the polls. Given what happened in the congressional elections, say that Republicans outperformed the polls and Democrats undeperformed.

        • “Republicans outperformed the polls and Democrats undeperformed.”

          Yep. The map of US house seats is astounding.

          Democrats spent $$$ to unseat several Senate Republicans. Will be interesting to see how that plays into memes about money and elections.

        • “Democrats spent $$$ to unseat several Senate Republicans”

          I meant in a *failed bid* to unseat several Senate Republicans.

        • Yeah, people on both sides talk about “buying elections” but it’s not so easy. Money is necessary, but not sufficient.

        • I think political spending is great. How much $$$ did Bloomberg spend this year? I’m sure he created or sustained hundreds of jobs. Every election is an economic stimulus package!

        • Andrew –

          > say that Republicans outperformed the polls and Democrats undeperformed.

          Well, that may be what’s most important but it’s not a consistent pattern over time across elections. Is it unique in some consistent way to elections where Trump is involved in the election? Is it an wmegemf trend? How do we know the cause? The evidence isn’t particularly consistent with the “why Trump voter” theory (if that theory means lying to pollsters about intent and not a differential non-response).

        • From NYTimes:

          -snip-

          On the subject of the coronavirus pandemic, it is also notable that compared with most pre-election surveys, the exit polls showed a smaller share of respondents favoring caution over a quick reopening. As of Wednesday afternoon, with final adjustments still anticipated to the data, there was only a nine-percentage-point split between the voters saying it was more important to contain the virus and those saying they cared more about hastening to rebuild the economy, according to the exit polls. In pre-election surveys, the split had typically been well into the double digits, with a considerable majority of voters nationwide saying they preferred caution and containment.

          It appears that the virus was also less of a motivating factor for voters than many polls had appeared to convey. This year, the exit polls — conducted as usual by Edison Research on behalf of a consortium of news organizations — had direct competition from a new, probability-based voter survey: VoteCast, collected via an online panel assembled for The Associated Press by NORC, a research group at the University of Chicago. By looking at the divergence between the exit polls’ numbers and the responses to the VoteCast canvass, we can see that there were far more voters who considered the coronavirus a big-deal issue in their lives than people who said it was the issue they were voting on.

          The VoteCast survey found that upward of four in 10 voters said the pandemic was the No. 1 issue facing the country when presented with a list of nine choices. But in the exit polls, when asked which issue had the biggest impact on their voting decision, respondents were less than half as likely to indicate it was the pandemic. Far more likely was the economy; behind that was the issue of racial inequality.

          Not every pollster fared poorly. Ann Selzer, long considered one of the top pollsters in the country, released a poll with The Des Moines Register days before the election showing Mr. Trump opening up a seven-point lead in Iowa; that appears to be in line with the actual result thus far.

          In an interview, Ms. Selzer said that this election season she had stuck to her usual process, which involves avoiding assumptions that one year’s electorate will resemble those of previous years. “Our method is designed for our data to reveal to us what is happening with the electorate,” she said. “There are some that will weight their data taking into account many things — past election voting, what the turnout was, things from the past in order to project into the future. I call that polling backwards, and I don’t do it.

        • dhogaza –

          > I wouldn’t put a lot of faith in exit polls. For instance, there aren’t any in Oregon … :)

          No doubt. They are likely even less this year than usual – given how the pandemic has affected who showed up at the polls.

          Still – its info.

        • Congressional Republicans didn’t do great in 2018, but did much better in 2016 and 2020. In light of this, it seems to me that the Republican overperformance is best explained by Trump and not vice versa.

      • I do not have evidence of a “shy Trump voter”, but I do have evidence – as I hope we all do – of the disdain, skepticism, and cynicism of Trump’s *base* for any type of political enterprise or process that would not explicitly support their world view from the outset. We know feel this is sufficiently observed if we have been following Trump-supporting media of the last 4 years and the large amount of reporting (across the political spectrum) on Trump’s presidency. That disdain of the *base* Trump voter for what they assume to be the “Deep State” and anything perceived to be connected to it is more-than-palpable. Some of you may have a few in your family.

        In Nate Silver’s podcast on Oct. 30 (https://www.youtube.com/watch?v=Mu-tWi3s-Ow&t=2388s, minutes 19:22 to 26:30), he was close to incredulous of any suggestions (in reference to an Oct. 29 Politico article, “People are going to be Shocked”) that the Shy Trump voter existed. Of a number of statements he made, he claimed evidence *against* a Shy Trump voter includes observations of poll respondents who, when asked this year, admitted they voted for Trump in 2016 but would now vote for Biden.

        I don’t want exaggerate a claim of what a Shy voter is (see my definition below), but I do interpret these types of respondents are perfect example of a potential “Shy Trump” voter in 2020… Of course that’s just me, and pure speculation.

        i think it may be very difficult to prove and identify explicit Shy Trump voters with high degrees of accuracy, as by the very definition, these may be respondents who will not always give an honest answer to a pollster (or answer the poll at all).

        But if we look at the data before us, we have an election with a near mirroring of the systematic error observed in 2016. Is there hard evidence that the shy trump voter exists? No. Is the existence of a shy Trump voter a plausible explanation of the parallel outcomes of 2016 and 2020? I would think we should consider this seriously, though I agree it may not be the only reason for 2020’s outcome.

        My theory remains that the *base* Trump/Republican voter has a different world view and rationality as a polling respondent, and their response to any arbitrary polling question is more unpredictable than a Biden/Democrat voter. It would be fascinating to develop a proof of theories like these either way. I hope after 2020 there’s a greater effort to consider these theories than outright disregard them.

        (For the record, my definition of a “shy trump voter” is broad and I apologise for not stating it. I perceive this as a respondent who lies or avoids a poll altogether. I understand that 2020 pollsters attempted to correct for observations of the latter, and felt confident that they did so in their projections.)

        • Adam –

          > I do not have evidence of a “shy Trump voter”, but I do have evidence – as I hope we all do – of the disdain, skepticism, and cynicism of Trump’s *base* for any type of political enterprise or process that would not explicitly support their world view from the outset.

          Sure. That would explain why Trump has such a following – but not why the polls underestimate his political draw.

          > I perceive this as a respondent who lies or avoids a poll altogether. I understand that 2020 pollsters attempted to correct for observations of the latter, and felt confident that they did so in their projections.

          OK. If you lump those two phenomena under the same label then I think the idea is more plausible. But I think they are kind of separate. And I think there is evidence to suggest that the first phenomenon isn’t very explanatory for the polling errors.

          If it the second phenomenon it might be more of a “shy republican voter” – but even there I don’t think “shy” is a particularly apt descriptor.

        • > Sure. That would explain why Trump has such a following – but not why the polls underestimate his political draw.

          I’m not necessarily sure I would agree. I go back to my two questions in my first post. Put yourself in the shoes of a base Trump/Republican voter in 2020 versus a base Biden/Democratic voter in 2020. I do think, if you feel generally mindful of the cultural/political difference of Trump’s base and Biden’s base, one would not answer those questions the same. So this is my opinion of course, but I do think it’s possible to assume that a base Trump/Republican voter in 2016 and 2020 would engage with a polling organisation differently than a Biden/Democrat, and the atmosphere of the Trump world would suggest poll participation (or honesty) would be lower amongst that side. By how much is the question… but it doesn’t seem it would need to be a significant large proportion of Trump voters acting in this “shy” manner to explain polling error.

          > If it the second phenomenon it might be more of a “shy republican voter” – but even there I don’t think “shy” is a particularly apt descriptor.

          You are right, I agree. A “shy” voter is not the right term for what I am trying to identify. A “disaffected” voter, maybe? … if the view that a credible pollster is an agent of the “Deep State”?… this is obviously wrong too, but it’s true the “shy” term is not appropriate either…

  16. “the polls messed up”

    Did they? I mean OK it came out much closer than expected. OTOH this was hard-fought right up to the wire and heavy issues were on the line.

    If a candidate makes a major position statement that shifts people’s opinions, how long does it take that to show up in the polling numbers? Is it necessarily instant?

    Trump made several major moves on environmental issues in the last week and these could have pulled some people who’d left the Trump camp back into it. Also there was a lot of talk about stacking SCOTUS, which also could have repelled some people from Dems.

    That’s one reason the margin of error is wide, because there’s no way to know how long it will take major changes to show up in polls, and if you’re polling only a small number of people it would be easy to miss a modest shift in opnion.

  17. FWIW, I made a polls-based forecast of the Biden share of the 2-party vote last Thursday, and posted it along with contemporaneous forecasts from FiveThirtyEight and The Economist:

    https://thehumeanpredicament.substack.com/p/presidential-election-forecast

    We’ve only got 31 states with >99% of precincts reporting, but based on those, here’s how the contest is shaping up:

    Bias: +1.2% (me), +2.9% (Silver) +2.7% (Gelman)
    Root mean squared error: 2.4% (me), 3.4% (Silver), 3.1% (Gelman)
    Mean absolute error: 1.7% (me), 2.9% (Silver), 2.8% (Gelman)
    Median absolute error: 1.1% (me), 2.8% (Silver), 3.2% (Gelman)
    Max absolute error: 6.8% (me), 7.8% (Silver), 6.8% (Gelman)

    All underestimated Trump, but mine underestimated him the least, and that’s what seems to have made the difference. (I also got the Florida winner right, which is the only state the other two have conclusively gotten wrong at this point.)

    I’ll elaborate on the details of my approach at some point, but the short story is I assumed the polls would be roughly as wrong this year as they were in 2016. Averaging the polls plus a Trump-specific bias correction seems to get you most of the way there. The mistake was assuming that weighting for education was going to fix the problem, which most pollsters have been doing since the last go round.

  18. There are now more than 70 comments, and not a single one mentioned voter suppression as a possible explanation. I’m surprised. Do polls adjust for rampant voter suppression in key areas?

    • I don’t think there was really a lot of evidence for this on election day. One bit of evidence would be the very high percentage of registered voters who did vote (high by our standards). Record turnout by both Dems and Repubs, most areas having short lines, states expanding early/mail-in voting which was heavily used, etc.

      Also, people were on the lookout, plenty of monitoring of polling places. It apparently was clean.

  19. I wonder what the poll error looks like as a function of time (meaning now, 4 years ago, 8 years ago, etc). If there’s a lot of autocorrelation you could improve on the polls just by assuming there’s 0.6x as much error as last election, or whatever the number is.

  20. Andrew,

    Do your concepts of Type M & Type S errors apply here and to the polling errors more generally?

    I always get suspicious when I see polls with 700-1000 subjects when the issue sides or candidates are close to one another in proportion.

    Maybe I’m misreading things but don’t these create the very situation of size and magnitude errors you have catalogued in other social research?

  21. It’s genuinely great to hear “We messed up (again), let’s figure out how so we can do better next time”.

    But honestly, come 2024, my prior will be that election forecasting in general is about on the level of kids playing grownup. I’m probably done listening to the “sages”.

  22. There are no Shy Trump supporters. I can’t believe the pollsters still believe that crap. It’s really much simpler than that — They are actively LYING to you. Not refusing to answer. Not being hard to find. Intentionally spreading disinformation.

    Why would they do that? Because you are the enemy, that’s why. Yeah, you don’t think you are, but that’s what YOU think, not what THEY think. You may think you’re looking for some kind of scientific truth (or at least accuracy). That’s not how they think at all. And it’s really quite obvious.

    For the right wing, politics has replaced religion and even patriotism. The ONLY thing that maters to them is their politics. If you are expressing a fact that hurts their power, It doesn’t mater if you’re right — YOU ARE THE ENEMY.

    Look at what happened (again) at Fox News when they called Arizona for Biden, contradicting Trump who claimed to be winning there. It didn’t mater in the least that their election desk was right, nor that they were part of Fox News, the president’s personal propaganda machine. Anyone not supporting their agenda is the Enemy.

    Similarly, it doesn’t matter how wrong you are or how unscientific. If you blather nonsense that supports the right it is accepted. Facts are not valued. Objectivity is not valued. Truth is not valued. Only loyalty to the cause maters.

    Or you could look at it another way: Everyone in the media except for Fox News and right wing radio is “The Liberal Media”. Where do your polls appear? In The Liberal Media. Polls are not produced for true results but as ammunition for the enemy. Therefore Pollsters are the enemy, so why NOT lie to them?

    They are subverting the polls because the polls hurt them. And they don’t care if what the polls report is true. All that maters is they are used against them.

    • I don’t doubt that _some _Trump voters lie to pollsters. Perhaps some Biden voters do too. But I don’t see how you can be so convinced that this is such a big effect.

      As Andrew and others have pointed out, a higher-than-expected turnout of Republican supporters seems to be more consistent with the data than either a “shy Trump voter” or “lying Trump voter” effect.

      • It could be a big effect combined with differential nonresponse. So if most Trump voters don’t want to talk to pollsters, and therefore don’t answer or hang up, except those who REALLY WANT TO LIE who actively try to answer all the pollster’s calls and tell lies, you’ll get hugely biased information.

        • I cannot imagine how respondents are found anymore. I don’t respond to any calls not on my white-list. However, in my very narrow frame of experience I can report the following phenomenon: recorded or robotic calls coming from entities to which, in principle, I am sympathetic; but the fact that they cannot be bothered except to play a recording angers and disappointments me. The same for the robotic e-mails. If I may extrapolate from my own fragmentary experience, I would say that whichever entity (or ’cause’) manages to approach its supporters, directly with a sincere, non-mechanized manner, will eventually hold the advantage. Kierkegaard tells an anecdote somewhere about a fool of a brewer, who thinks to push up his sales by selling each bottle a penny less than its cost; when someone points out the impossibility of it, he objects, “No! you don’t understand: it’s the big number that does it!”

  23. A question based on ignorance: to what extent does the polling procedure within a state account for spatial/geographic heterogeneity of those polled? Put another way: how do the actual data collection mechanisms take into account potentially highly localized variations? Do they take this into account adequately?

    Imagine sampling temperature at different points. Suppose temperature is very high (say 90) in two or three highly localized regions while generally it is around 50. If there are steep, highly localized, gradients then one can mistakenly infer that the temperature is close to uniform. A spatially random sample might misjudge the temperature distribution badly by simply missing these highly peaked regions. If one knows a priori where such regions might be, one can fix this by sampling a bit more in these regions. I assume something like this is done at the level of zip codes or census tracts, but perhaps the spatial localization is even stronger or perhaps it is not done well. It might simply be that one needs to poll more people than a priori seems correct, in order to make the poll sufficiently finely grained in spatial terms.

    For example, thinking of Florida, suppose that support for Trump concentrates heavily in cetain neighborhoods in Miami (is this true?) even though Miami on average is slightly pro Biden (but noticeably less so than Broward county). If one’s sample is not weighted, it might underestimate the effect from these neighborhoods.

      • I don’t know if they’re sorting them, but I could see why you’d want to: you’ve got a big stack of ballots that are all supposed to be for Biden, and another big stack that is supposed to be for Trump. If you want to, say, have someone go through by hand to verify, all they have to do is look at them one at a time and pull out any that aren’t what they are supposed to be. Whereas if you’ve got a big stack of mixed ballots, you have to re-tabulate everything.

  24. Given the extraordinary circumstances of this election, have you considered that your model might just indicate there’s a big pile of uncounted ballots in a post office somewhere in Florida?

  25. I was just reminded me of the social circle polling method (ask people about their social circle), studied by Galesic, Bruine de Bruin et al:
    https://www.nature.com/articles/s41562-018-0302-y

    Actually some of Galesic’s other work too, on things like social sampling affecting perceptions of broader populations and biases in how people estimate the size of minority groups in networks also seems useful here.

    If you believe in a shy Trump voter then could this have helped this time around? It would seem like it, since you’re no longer depending only on the Trump voters to get information about them. Plus your poll data might capture latent info about social influence. But I’m not a polling expert. Is there any reason to believe this isn’t something that work in place of current polling methods?

      • I also used a similar approach in a different context. I was researching child labor in a set of developing country contexts using employer surveys, and I worried that I might not get honest answers. So I asked both individual questions (“How much are your child workers paid?”, “For which of these reasons do you employ children?”) and what I called ecological questions (“How much do other employers in your community pay children for their work?”, “For which of these reasons do other employers in your community employ children?”) As it happened, there were no problems with response rates on the individual questions, and the answers checked out well with other data. Correlations between responses to individual and ecological questions were very high. If you’re interested you can look here.

  26. What do you think of this comment from Harry Enten:

    > But to judge the extent of the polling errors we really got to wait until vote counts are more complete. I’ll leave you with a scary thought: I’m not sure any individual poll average miss in a swing state was outside the 95% confidence interval for state polling.

    https://twitter.com/ForecasterEnten/status/1324353219724804099

    Basically, the MOE on state polls is higher than polls report because historically state polling errors are really large, larger than pollsters report. In this case, pundits like Harry really can make quite a confusing horserace because there shouldn’t be any expectation state polling is very accurate to begin with.

    • Ah. I see. It’s like the banking crisis: you pool a really large number of really bad (largely fraudulent loans by local banks (note: these were NOT the “sub-prime” loans)) and tell investors (and yourselves) that the pooling bit magically makes the conglomerate a good investment and then are surprised when the scheme implodes.

      (Actually, I was wondering about this. During the run up to this election, I noticed that the number of samples in the polls were really small: a few hundred or so. Was that a large enough sample, I wondered? My memory has it that the polls I read about in the Japanese press have much larger sample, multiple thousands. My hypothesis is that with the demand for frequent poling, the amount of money available to spend on each poll goes way down.)

      OK, bashing people after the fact isn’t nice. Sorry.

      Serious question: I’ve been noticing recently that people are trying to make conclusions from exit poling data. This seems seriously nuts, since we know that anyone who understands that Covid-19 isn’t a liberal scam voted early, and exit polls are meaningless this year.

      But people are still talking seriously about exit polls. Is that as nuts as it seems?

      • David,

        “People” are going to keep doing and talking about exit polls just like they’re going to keep trying to aggregate pre-election telephone polls and pretend they can tell us something. There is a huge viewership for all that sort of forecasting and “people” will continue with it until they are finally laughed out of the room after some future election.

      • > (Actually, I was wondering about this. During the run up to this election, I noticed that the number of samples in the polls were really small: a few hundred or so. Was that a large enough sample, I wondered? My memory has it that the polls I read about in the Japanese press have much larger sample, multiple thousands. My hypothesis is that with the demand for frequent poling, the amount of money available to spend on each poll goes way down.)

        It’s much better to have multiple polls with a few hundred or so samples, especially from different people, than one big one with several thousand. Actual sample size based uncertainty is much less important than bias in sampling and you only can get an assessment of that by repeating with different people/methodologies.

  27. Wait. There is definitely no “shy Trump voter” effect. Trump explains. It isn’t that pubz lie to pollsters. The pollsters are faking the results to suppress the vote.

  28. Hi all,

    I’m blowing in from epidemic forecasting, where we have our own problems (you may have noticed a few).

    Why don’t pollster just release their raw data, and let the modellers/statisticians have a go? In epidemic modelling there is *nothing* more annoying than serological surveys where they are reporting a “corrected” figure, reweighted by biases in the serological sample population relative to some background demographics.

    The reason this is vexing is because infection risk is itself heavily biased, and part of the epi modelling job is to try and figure that out vis-a-vis real data not the headline figures of data where some medical statistician, who hasn’t read a stats paper since 1995, has already done their dummies guide to bias correction overlay which you then need to try and reverse unpick.

  29. What do you make of Trump enthusiasm and consequent turnout? In 2016 and 2020, Trump drew massive crowds, caravans, and flotillas, while Clinton and Biden had small, tepid events. It might be difficult to quantify, but there was an obvious difference.

    Obama similarly generated large crowds with a religious fervor of “hope and change”. Intensity of support might augment median support to predict votes.

Leave a Reply to Dan Cancel reply

Your email address will not be published. Required fields are marked *