Politics is not a random walk: Momentum and mean reversion in polling

Nate Silver and Justin Wolfers are having a friendly blog-dispute about momentum in political polling. Nate and Justin each make good points but are also missing parts of the picture. These questions relate to my own research so I thought I’d discuss them here.

There ain’t no mo’

Nate led off the discussion by writing that pundits are always talking about “momentum” in the polls:

Turn on the news or read through much of the analysis put out by some of our friends, and you’re likely to hear a lot of talk about “momentum”: the term is used about 60 times per day by major media outlets in conjunction with articles about polling.

When people say a particular candidate has momentum, what they are implying is that present trends are likely to perpetuate themselves into the future. Say, for instance, that a candidate trailed by 10 points in a poll three weeks ago — and now a new poll comes out showing the candidate down by just 5 points. It will frequently be said that this candidate “has the momentum”, “is gaining ground,” “is closing his deficit,” or something similar.

Each of these phrases are in the present tense. They create the impression that — if the candidate has gone from being 10 points down to 5 points down, then by next week, he’ll have closed his deficit further: perhaps he’ll even be ahead!

But, as Nate points out, this ain’t actually happening:

Say that a candidate has improved her position in the polls from August to September. Is her position more likely than not to improve further from September to October? . . . this is not what we see at all . . . Sometimes, a candidate who has gained ground in the polls continues to do so; otherwise, the trend reverses itself, or the race simply flatlines. . . . There is also no sign of momentum we look at the change in polling between other periods. . . . In general elections, the direction in which polls have moved is not predictive of the direction in which they will move. [italics added]

I like Nate’s analysis. It’s very much in the Bill James style, but with graphs.

Consider the time scale

Enter Justin Wolfers, who writes that Nate is all wrong, that there is momentum in political polling.

Justin argues that Nate made a mistake by using the same data in his “before” and “after” comparison. Suppose you have poll averages at times A, B, and C. Nate is saying that the change from A to B does not predict the change from B to C (or, to be precise, that the change from A to B is a very weak and negative predictor of the change from B to C). Justin says that you should gather data at times A, B, C, and D, and see if the change from A to B predicts the change from C to D. Using a time series of unemployment data (since he didn’t have ready access to poll summaries), Justin indeed finds a positive correlation between B-A and D-C.

How to adjudicate the Silver/Wolfers dispute? My quick answer is that they’re both correct. Yes, Justin is correct that if you want to measure the persistence of trends, the D-C vs. B-A comparison can be better than the C-B vs. B-A comparison.

But Nate is correct too in that he’s answering a direct question. People really do look at the change from A to B and then try to extrapolate it to C. And Nate’s analysis directly addresses the problems in trying to do this extrapolation.

From a statistical standpoint, any study of trends has to consider the time scale. At the shortest time scales, there is certainly “momentum” in the underlying time series but the actual measurements are verv noisy, making it difficulty to see any trend. At longer scales, you can average to get lower noise levels, but then trends are not so stable. You can see some of this from the unemployment series that Justin posted, which is a mix of long-term variation on the 5-year scale and short-term noise of the monthly measurements.

Public opinion is not a random walk

Nate does slip up at one point, when he writes:

In races with lots of polling, instead, the most robust assumption is usually that polling is essentially a random walk, i.e., that the polls are about equally likely to move toward one or another candidate, regardless of which way they have moved in the past.

This isn’t quite right. For many races, you can use a forecast from the fundamentals to get a pretty good idea about where the polls are going to end up. Our original example here is the 1988 presidential election campaign. Even when Michael Dukakis was up 10 points in the polls, informed experts were pretty sure that George Bush was going to win. Or, if you want to focus on congressional elections, take a look at the work of Erikson, Bafumi, and Wlezien, who find predictable changes in the generic opinion polls in the year leading up to the election.

Nate’s error, I think, is coming from two sources. First, individual polls are noisy, and so any immediate changes are likely to be noise. Second, the random walk is such a standard paradigm for statistical noise that it’s natural for Nate to use it as a default. “A random walk down Wall Street” and all that. But polls are not stock markets. As I’ve said many times, a poll is a snapshot, not a forecast. There really can be predictable changes in the polls, even if they’re hard to notice amid short-term variation.

If you switch the application area from politics to baseball, I think Nate would see this right away. Are the baseball standings a random walk? No. If a team has an unexpectedly good record midway through the season, it’s likely that they will slip in the standings during the second half. (I haven’t looked at the actual numbers for baseball teams, but general statistical principles would suggest that the best prediction for a team’s winning percentage during the second half of the season, given the team’s record in the first half, would be something in between the preseason prediction and the team’s midseason record.)

Why this bugs me

From a political science standpoint, I want to continue pushing back hard on this because I believe the random walk story has contributed to major misunderstandings about politics. (I was going to write “the random walk story contributes to . . .” but then decided to follow Nate’s recommendation to describe the past as past rather than to implicitly extend trends into the future.)

The natural accompaniment to the random walk model is the idea that, if you can shift the polls by X percentage points at any point during the campaign, this will give you an expected X percentage point advantage when the election comes around. Another implication is that George Bush’s campaign was so awesome and Michael Dukakis’s campaign was so horrible to explain the big swing in the polls in the months leading up to the 1988 election. (We think a much more plausible story is that the polls were gonna swing toward Bush big-time, and the perceived incompetence of Dukakis’s campaign was a consequence, not a cause, of this polling shift.)

In summary, “momentum” can exist, but the places where you’ll see it is in races where current public opinion is out of step with best predictions. The mere information that a race has a 5-point swing is not enough to predict a future shift in that direction. As Nate emphasizes, such a prediction is only appropriate in the context of real-world information, hypotheses of “factors above and beyond the direction in which the polls have moved in the past.”

14 thoughts on “Politics is not a random walk: Momentum and mean reversion in polling

  1. > you can use a forecast from the fundamentals to get a pretty good idea about where the polls are going to end up.

    Maybe I don't understand, but isn't Nate saying that you can't predict the *next* poll result *based solely on previous poll results* (i.e. the trend)?

    This makes "fundamentals" and "where the polls are going to end up" sort of irrelevant to evaluating his statement, doesn't it?

  2. Dear Andrew! May I ask your advice or guidance banal references concerning my question. Imagine a team of judges (jury) of the competition. Let their estimates are in the range 1 … 10. And a dozen teams that within a few rounds, they meet each other in various combinations. There is a suspicion that some of the judges "not indifferent" to their teams, and accordingly, the stricter of fellow competitors. Is it possible, by analyzing the statistics estimates, to determine which of the judges goes beyond fairness? It seems that Bayesian approach would be helpful. Thank you.

  3. Steve:

    My point is that the polls are not a random walk. The issues of "where the polls are going to end up" is not directly relevant to Nate's statement about polls and momentum, but it is relevant to his claim that the polls are a random walk.

  4. [Sorry, this will be a little long]

    Mark Pickup has written some papers that touch on this, covering elections in the US, Canada, and the UK. In every auto-regressive model of elections that takes house effects into account, the estimated auto-regressive coefficient is always very close to 1. Strauss looked at the 2004 presidential elections with a Bayesian auto-regressive model and found a similar result.

    I also did some pretty rudimentary unit-root tests on the 2008 national race (No missing days!) a while back and couldn't reject the unit-root null hypothesis.

    This probably wasn't true 20 years ago, but the nature of campaigns and politics has changed drastically since then. Campaign spending is a lot higher,the media landscape is very different, and Partisanship is much more well defined. This creates an environment where information disseminates very quickly, which reduces momentum, and opinion becomes much more persistent, which reduces mean conversion.

    Evidence is strong that the economy is the primary driver of political outcomes. But on the time-scale relevant to campaigns, changes in the economy are usually small and hard to measure for forecasters (Remember that it took us a year and a half to realize we were in a recession).

    To bear that out with some numbers: The generic ballot, if we only look at LV polls, has been completely stable since January. No Bafumi-style slide! The senate has shown the same result: Most of the races have been pretty stable this year about half of the shifts have been in favor of the Democrat!

    In the longer run, this isn't true: You were correct to predict that the bad economy would be bad for Democrats. But the point where it becomes difficult to predict elections changes comes earlier then you think.

  5. David:

    As you point out, the time scale is important. The shorter the time scale, the more it looks like a random walk. If you fit an AR(1) model to time series data, the correlation parameter gets fit to the autocorrelation at the shortest scale. In addition, when you fit an AR(1) model to data with a trend, you'll get high autocorrelations. To put it another way, my "mean-reversion model" is not the same as an AR(1). For one thing, in a classical AR(1) process, the series starts with a random draw from the stationary distribution and proceeds from there. In a mean-reversion process such as the 1988 election campaign, you can start far from the stationary point. To put it yet another way, the current state of public opinion does not necessarily reflect anything like an accumulation of all available information. The stock-market analogy doesn't hold.

    To get back to your main point: yes, you'll see a lot less signal and a lot more noise if you look at short-term fluctuations.

  6. "To put it yet another way, the current state of public opinion does not necessarily reflect anything like an accumulation of all available information. The stock-market analogy doesn't hold."

    I would argue that on the time-scale that people tend to care about, it seems to hold pretty well. There's been barely any movement this year in most of the races, and the movement that there has been hasn't favored any particular party. See http://poughies.blogspot.com/2010/10/campaigns-ha… .

    I'm not as familiar with the 2008 data, but from my memory, the same was true then.

    Theoretically, with the internet, well-funded opposition research teams, and well-defined partisanship for most of the population, polls *are* a good accumulation of available information. In 1988 it wasn't, but today, I see little evidence that experts can outperform outperform polls for most races.

  7. Just to throw one last word:

    This isn't always true. One candidate could, for example, hold all of his money and then only spend it right before election day. A lot of people think that's what happened in the Pennsylvania Senate race this year.

    A candidate with a Republican candidate with a big lead in a blue state might be expected to revert. But his sort of partisan mean-reversion hasn't actually happened this year, a incredibly right-wing gubinatorial candidate has had steady leads in Maine for the last year. Walt Minnick in Idaho-1, a Democrat who won by 1% in a district where McCain got 62% of the vote, is set to win this year by 40 points.

    Fundementals-based forecasting models are good priors, but when they disagree with extensive polling, it's more likely that the model is wrong then that public opinion will shift considerably toward the model.

  8. David:

    It's not a matter of experts "outperforming" opinion polls. The experts and the polls are answering different questions. Polls tell you what people are thinking right now, which is of interest in their own right. In any case, I agree that if the polls are surprising, compared to existing expectations, then this provides some information.

    In any case, I think our conversation is going in the way that Nate would like, in that we're talking about factors that can influence the election outcome rather than merely looking at short-term trends. I think this was Nate's main point, that a context-free extrapolation doesn't make sense.

  9. If there are fundamental variables which forecast the direction of polls, then wouldn't it be more sensible to test AR vs I(1) on the residual of poll numbers subtracting out the predicted value of the poll numbers regressed on fundamentals? I.e. the question is not whether the raw poll numbers are random walks, but rather whether, conditional on fundamentals, they are random walks?

    Also, a big part of the putative randomness of financial data is that arbitrage will ensure that they are so. I don't see a corresponding force in polling.

  10. Similarly, focusing on day to day changes in financial markets might lead one to argue that if short-term traders increase daily volatility and noise, this practice should be discouraged. That sort of volatility is not relevant to people's lives for the most part, however. Volatility at a monthly and annual level, booms and busts in asset prices, is what matters, and those are arguably more likely if trading is less active. The easiest way to get a 20sd price shock is to just not mark or mis-mark prices for a while.

  11. I was making a different claim, that polls, if house effects are corrected for, do represent EMH-style best estimates of opinion in November. Polls tell you what opinion is right now, but I think they also do close to the best job of predicting what opinion will be in November.

    You claimed that "There really can be predictable changes in the polls", and I don't think that's really true anymore for competitive well-funded races during the last 8-9 months of the campaign. That idea has both theoretical and empirical support. (Technically, I guess I'm claiming that Public Opinion is a martingale)

    The traditional papers that do find such effects, are based off of all post-war elections and dominated by Gallup polls that were objectively cruddy(More then half of the mid-terms in the commonly used Roper dataset only had Gallup polls in the database). They make predictions that haven't been borne out in at least the last two cycles.

  12. "I believe the random walk story has contributed to major misunderstandings about politics."

    by far my favorite sentence of the post. it seems to express:

    — that you have a particular belief
    — that something contributes to something else (as opposed to implicit monocausal structure)
    — and that it "has" happened.

    imagine if we all expressed everything that way? instead of "i suck at math," we could say, "i believe i have sucked at math." or, instead of, "this is your fault," we have, "i believe you may have contributed to this." i can't argue with the truth content of the statement (well, maybe you don't actually believe it, but assuming you do believe what you claim to believe), the discussion hedges on what our beliefs are with regard to what the contribution is (since we may both believe there are very few actual zeros), instead of taking things personally.

    i approve ;)

  13. Ah… The post-convention bounce, one of my favorite phenomena. We have an external event that moves the opinion polls but not the final result. We also know that polls gyrate far more than results do (at least in U.S. Presidential elections). We also know that most opinion poll individual changes are well within the margin of error. Given the differences in likely voter models, who is to say that there has been any movement at all this year?

Comments are closed.