What was going on with the New Hampshire polls?

Yair writes:

Before Iowa, Hillary was beating Obama in NH by like 20 points, or at least double digits. After Iowa, Obama got this huge surge in the polls. You can see the time series here.

It’s a mystery why the polls were so wrong. Here’s my theory (which gets a bit long and technical but might be interesting to some, and I just feel like writing it down). I think it comes down to three parts:

1. The likely voter screen and its potential deficiencies
2. Problems in survey weighting, especially when Iowa turnout was so strange
3. Obama being black

First – Erikson, Panagopoulos, and Wlezien wrote a paper showing that the Gallup poll overestimates fluctuation in the electorate when using the likely voter screen early in the election (paper attached). In a nutshell, what happens is this: because the Gallup poll (and most other polls) are interested in interviewing “likely voters” only, they ask a series of screening questions at the beginning of the poll to gauge the respondents’ interest in the election. They then have some formula to determine who is a “likely voter”, and they throw out the remainder of the results. This paper examined the results that were thrown out along with the poll and found that, when something is going wrong for a candidate, their supporters are less enthusiastic and therefore less likely to be considered “likely voters” during this screening process. As a result, many of the supporters of the “losing” candidate just aren’t counted in the poll, because pollsters think they’re not going to vote. This makes fluctuations in polling seem more dramatic than they actually are.

In this case, Hillary was winning big in NH. When Barack won Iowa and everyone in the media started praising him relentlessly, he started getting a boost. Because of this likely voter screen thing, his boost in the polls was exaggerated, and because the elections were so close to one another, the polls didn’t have a chance to settle down into an equilibrium. This means Obama was never actually leading, and all this talk about “something happened in the last 24 hours” is all a load of BS.

Second – survey weighting. Whenever a pollster does a survey, they need to make the poll representative of the voting electorate (that’s why they do the likely voter screen, for example). Another big thing they do is essentially guess what the demographic makeup of the electorate is going to be. Usually this is done on historical data and census data, but it’s always really hard in primaries because they’re not very consistent. So, for example, usually it’ll be something like 10% of the electorate is people 18-25, and like 25% are 65+, and so on (I’m making these numbers up). So the pollster will first try to get this breakdown in who they actually talk to, and if they can’t, they’ll then “weight” the survey – meaning count certain people more than others – to simulate the expected breakdown.

In this case – I’m guessing here, but I think the pollsters probably saw how weird the electorate was in Iowa (i.e. SO many people turned out, and so many young people), that they probably tried to compensate by weighting young people a ton in the following polls to NH. Now, we know that young people support Obama disproportionately. If the pollsters overcompensated for young people, then Obama’s support was artificially strengthened in the polls. I’d have to look at the actual turnout numbers in more detail to check this out.

Third, Obama is black. Some people have a theory that people will lie in a poll and say they support the black candidate because they don’t want to seem racist, but then they actually vote for the white person. This one is going around in the media already, but I find it hard to believe, or at least I don’t think it’s the only reason for the problems. First, the idea in general seems kind of crazy, that people think it makes sense to lie in this way in large numbers – crazy that it would have such a large effect, anyway. Second, we’re talking about Democratic primary voters, NOT the general electorate. These people are the least likely to be racist. Third, it’s not like the alternative was some gun-toting white guy from the Klan, it was Hillary Clinton. If these people are racist, they’re probably not going to be running to her. Still, this might have had a small effect.

So anyway, that’s my theory. It should be noted that none of this is written or talked about anywhere in the media, which is a shame in my opinion. And if I’m correct, this has huge implications to the election which are going to be ignored. Specifically what I mean is this – these early primaries and caucuses are important not really because of the delegates, but mostly because they build momentum and a storyline for the media to talk about in advance of the future primaries. In this case, the media’s storyline goes something like this … “Obama won Iowa and had all the momentum. Hillary was on the ropes and losing by double digits. But her campaign rallied in the final 24 hours. She ‘found her voice’, showed some resiliency and this is a turning point for her.” Based on the data they’re looking at, this makes sense. Too bad it might be totally wrong.

In reality, I think Hillary was steadily losing ground as Obama was gaining momentum, and it truly is remarkable that Obama closed the gap by so much in the final weeks. If the media saw this, the storyline would be totally different, which has significant effects on the future primaries – donations, momentum, etc. And to me, this storyline makes a lot more sense. I’m sorry, but I just don’t believe that crying on TV in the middle of a speech is good for a presidential campaign. I looked at a bunch of the events on CSPAN in the last week, and I’m telling you, that looked like a campaign on the ropes. She was breaking down, Bill was going crazy, the audiences were NOT enthusiastic at all, and the media coverage was dismal. The theory that “something just happened” in the last 24 hours seems insane to me.

My only comment is that things are a lot less stable when there are several candidates in the race to choose from. Even if the main focus is on #1 and #2, there are a lot of these other options floating around that make the decision more complicated and the outcome less predictable.

P.S. Daniel Lippman points us to these three news articles about how the polls got things wrong: 1 2 3.

P.P.S. More here.

9 thoughts on “What was going on with the New Hampshire polls?

  1. These results suggest what we saw in the last presidential election are in fact a more common phenomena than many believe. It is interesting to note that many of the reasons offered then are not being offered now.

  2. Bee,

    I suspect the "likely voter screen" and related adjustment issues are more important in the primary than in the general election, because turnout in primaries is typically much lower

  3. I think Yair makes a really good point about the consequences of pollsters being "wrong". It is reasonable to believe that a person's evaluation of a candidate is at least partially determined by that person's perceptions of the candidate's popularity. If candidate A does better than the polls predict, then people have reason to update their own beliefs about candidate A's popularity and thus update their evaluation of candidate A. If weight given to popularity is sufficiently high in evaluating candidates, people who are indifferent between candidates A and B or very slightly favor B will all switch to supporting A. In a close race that is contested over time, as these primaries are, this dynamic could affect who wins.

    I'm curious about the "screening" that is mentioned in Yair's first hypothesis. Is it that the "low likelihood voters" are thrown out, or that all voters are weighted by their voting propensity? I should look at the cited paper, but just wondering if anyone knows off hand.

    Finally, when polls are reporting their "margins of error", isn't it true that poll organizations usually report a margin of error based on standard errors for a binomial distribution calculated near the 50-50 mark? Is there any reason to adjust these when we actually have a multichomtomous choice situation where support levels are between 0 and 40%? Would the margins actually be wider? Also, if weights are estimated, then I imagine the margins should be adjusted to reflect the uncertainty in the weight estimates. I get the sense that this is not part of the usual practice. Anyone know?

  4. Second, we're talking about Democratic primary voters, NOT the general electorate. These people are the least likely to be racist.

    That's pretty funny.

    They're the most likely to be obsessed with race.

    And they're the most likely to project racism onto other people, and thus to factor it into "electability".

  5. I would like to see some support for the statement that "Democratic Primary voters … are least likely to be racist." Also, I do not think that saying you like one candidate, then voting for another is racist. There are racial elements here, but "racist" is a strong word.

  6. I don't think your comment about democratic voters being less likely racist is correct.

    You could just as easily say that democratic voters want people to think they are not racists the most and thus lie more often.

  7. To Cyrus's question, it varies by pollster (in my little bit of experience). Generally, though, the screen is binary — you're deemed "likely" and get then included or you're deemed "unlikely" and bounced from the survey. Often times, this is done up front so that the pollster doesn't bear the cost of interviewing an "Unlikely" (interviewing being the biggest cost pollsters have, they are pretty vigilant here).

  8. With NH allowing independents to vote in either primary, I think the simple explanation is that McCain beat Obama in the race for the center.

    I bet McCain wins in Michigan today for the same reason (especially with the D race a no-show)

  9. Consider that Hillary got more votes than expected, but the predictions for other candidates (in both the republican and democratic races) were accurate. Doesn't this dismiss a lot of theories?

    The Bradley effect, for instance. It might have explained a lower support for Obama than expected, but that's not what happened.

    Weighting bias is also weakened as an explanation. If different people went to vote than were polled, how come all the other predictions were so spot on? What demographic group consists of only Hillary supporters?

    Well, there's one, obviously. Hillary supporters. Something brought them out in more numbers than expected, without affecting the turnout for the other candidates significantly. It could be "crying on TV", or an extremely successful get-out-the-vote effort. Whatever it was, I think it's unlikely that many people in the Hillary campaign thought it would work – for if they did that, I'd thought their confidence would have effected the prediction markets.

Comments are closed.