## Different election forecasts not so different

Yeah, I know, I need to work some on the clickbait titles . . .

Anyway, people keep asking me why different election forecasts are so different. At the time of this writing, Nate Silver gives Clinton a 66.2% [ugh! See Pedants Corner below] chance of winning the election while Drew Linzer, for example, gives her an 87% chance.

So . . . whassup? In this post from last week, we discussed some of the incentives operating for Nate and other forecasters.

Here I want to talk briefly about the math. Or, I should say, the probability theory. The short story is that small differences in the forecast map to apparently large differences in probabilities. As a result, what look like big disagreements (66% compared to 87%!) don’t mean as much as you might think.

One way to see this is to look, not at the probabilities of each winning but at forecast vote share.

Here’s Nate:

Here’s Pollster.com:

And here’s Pierre-Antoine Kremp’s open-source version of Linzer’s model:

It’s hard to see at this level of resolution but Pierre’s forecast gives Clinton 52.5% of the two-party vote, which is not far from Pollster.com (.476/(.476+.423) = 52.9% of the two-party vote) and Nate Silver (.485/(.485+.454) = 51.7% of the two-party vote).

That’s right: Nate and the others differ by about 1% in their forecasts of Clinton’s vote share. 1% isn’t a lot, it’s well within any margin of error even after you’ve averaged tons of polls, because nonsampling error doesn’t average to zero.

So, argue about these different forecasts all you want, but from the standpoint of evidence they’re not nearly as different as they look on the probability scale.

To put it another way: suppose the election happens and Hillary Clinton receives 52% of the two-party vote. Or 51%. Or 53%. It’s not like then we’ll be able to adjudicate between the different forecasts and say, Nate was right or Drew was right or whatever. And we can’t get much out of using the 50 state outcomes as a calibration exercise. They’re just too damn correlated.

P.S. All these poll aggregators are have been jumping around because of differential nonresponse. If you polls’ reported summaries as your input, as all these methods do, you can’t avoid this problem. The way to smooth out these jumps is to adjust for the partisan composition of the surveys.

Pedants Corner: As I discussed a few years ago, reporting probabilities such as “66.2%” is just ridiculous in this context. It’s innumerate, as if you went around saying that you weigh 143.217 pounds. When this point came up in the 2012 election, a commenter wrote, “Based on what Nate Silver said in his Daily Show appearance I surmise he’d be one of the first to agree that it’s completely silly to expect daily commentary on decimal-point shifts in model projections to be meaningful…. yet it seems to be that his deal with NYT requires producing such commentary.” But I guess that’s not the case because Nate now has his own site and he still reports his probabilities to this meaningless precision. No big deal but I’m 66.24326382364829790019279981238980080123% sure he’s doing the wrong thing. And I say this as an occasional collaborator of Nate who respects him a lot.

1. Noah Motion says:

Okay, so the national vote share is similar across models, but the state-level vote share clearly isn’t. The end result is determined by the electoral college, not the national vote share, so this distinction is important and, I assume, what explains the different predictions from the different models. Silver et al have written and talked about between-state correlations being higher in their model than in most others, which may be the main driver of these differences. Of course, because the 538 model isn’t transparent to anyone outside of 538, we can’t know for sure.

• Andrew says:

Noah:

Drew Linzer’s models and others capture state-level correlation just fine. You can go to Kremp’s page and see all the details. In any case, almost all the uncertainty in the election forecast comes from uncertainty in the national swing.

• Noah Motion says:

If almost all the uncertainty is coming from the national level, then why do models that have such similar national-level estimates produce such disparate predictions for the outcome of the election? Also, Silver et al have written that the state-level polls are their primary inputs, with national polls playing a more secondary roll, if I’m not mistaken. It’s not clear how to make this compatible with almost all of the uncertainty coming from national-level data. Of course, we can’t actually look at the 538 model directly, so who knows what’s really going on with it?

• Andrew says:

Noah:

Regarding state and national polls, I recommend you go to Kremp’s site which describes his model (adapted from Drew Linzer’s model) and also has all the code. Short answer is that it’s best to both use national and state polls. It’s not trivial but it’s not so hard, either, to combine state and national polls using hierarchical modeling. Again, Drew discusses this in his published paper and Kremp has it online with all the polls. I’m not particularly interested in the details of Nate Silver’s model, but my point in the above post is that small differences in the models (a shift of 1% in Clinton’s vote share, and slightly different uncertainty distributions for the national swing) can lead mathematically to apparently large differences in forecast probability.

• David says:

“small differences … can lead mathematically to apparently large differences ….” i.e. sensitivity to initial conditions, as far as the system generating the forecast probability.

• elin says:

If the national outcome is a binary variable, Clinton or Trump, and we assume that any predicted probability for a candidate greater than .5 means that they are predicted to be the winner, have we really seen a lot of variation? That’s not my impression. Variation on size of win yes variation on probability of win, yes, but variation on who will win? Not so much. There is a reason spread bets and money line bets both exist. In the end, after the fact, the probability for this specific event will be known to be 1 or 0 so you could say that the best model is the one that is giving predicted probability closest to 1 or 0. I actually wouldn’t do that in this context (because the results of voting by millions of people are different than individual votes ), but you could. We should also remember the way the rate of change of the slope changes in a binary model depending on where you measure it, with very high slopes if you are in the middle moving toward 0 as you get to the extremes. If you have a model with predicted probability around .50 then yes it is going to appear to swing wildly when there are changes to the inputs.

2. to me, 66.2% is a lot closer to reporting your weight as 143 pounds than 143.217

Sure, 66.2 is more precise than warranted, but rounding to the nearest percentage point means you get things like 66,66,66,66,66,67 and everyone says “OOH there was an uptick!” when in fact the number might well have been 66.2,66.4,66.4,66.3,66.3,66.5

in something as silly as political media, creating an opportunity for roundoff errors to cause news seems silly (that is, a jump from 66 to 67 caused by 66.3 becoming 66.5)

so, I’d be totally there with you if he’d reported 66.2433 but 66.2 I can get behind

• Andrew says:

Daniel:

In the presidential election context, presenting a win probability to a tenth of a percentage point is much much more precise than predicting the vote proportion to a tenth of a percentage point.

It turns out that specifying win probability to a tenth of a percentage point is equivalent to specifying forecast vote share to an accuracy of 0.004 percentage points; that’s 0.00004 of the vote share. So it really is like that 143.217.

• perhaps, and I have to admit that I was thinking vote share not win probability… but also I think there’s a bit of psychology involved here that shouldn’t be ignored. People are used to things having meaningful measurements to 2 sig figs, like my weight, or the time, people don’t generally say it’s 8:31 pm they say 8:30 they don’t generally say “i bought 15.58 gallons of gasoline” they say “I bought 15 gallons of gasoline” they expect that a change in the second decimal place is meaningful (if I think that I weigh 180 pounds and step on the scale and notice now i’m 190 pounds, that’s a big deal). Given that background, being excessively precise and noisy can be helpful for non-numerate people to interpret what’s going on.

If you try to measure a paperclip on one of those lab balances that reads to the microgram, you’ll see all the digits to the right of the 100th of a gram oscillating wildly as you breath on the scale (for this reason those super fancy balances have little hoods with a door you can close so air currents don’t affect your measurements). This rapid changing is a good indicator that those digits are meaningless.

So, to non-numerate people a repeated measurement with high precision and lots of visible noise is somehow more helpful than a stable low precision measurement. In fact, I now think that Nate Silver should push out 300 random draws in a little json array on his webpage and have a function that 3 times every second chooses a random one to display, his results would dynamically indicate the uncertainty in a way similar to someone trying to weigh a feather on a microgram balance outdoors.

• Andrew says:

Daniel:

If you want a json interface etc., I recommend you forget about Nate’s forecast and just plug something in directly to Kremp’s which is all in R and Stan and is on Github. Why waste your time for something you can’t really access?

In 2008 I wanted to compute the probability of a voter in any state being decisive in the election. I didn’t feel like running my own forecast so I emailed Nate, who sent me 10,000 simulation draws from his model. I used this to compute probabilities and then I wrote up a couple of papers on this, including Nate as a coauthor which was only fair as he gave me those numbers.

In 2012 I wanted to do these calculations again so I emailed Nate to ask for a matrix of simulations and he never responded.

In 2016 I did it with Kremp.

• I was more making a recommendation about how to communicate uncertainty to the general public as an alternative to round-off rather than actually suggesting something I’d like to do with Nate’s numbers.

Kremp’s stuff is great though, I like that time-series plot a lot. I did try to do some gaussian process regression with the state-space model you posted a month or so back, Stan wasn’t quite up to handling the 20,000 element covariance matrix.

• Martha (Smith) says:

Daniel said,

“People are used to things having meaningful measurements to 2 sig figs, like my weight, … (if I think that I weigh 180 pounds and step on the scale and notice now i’m 190 pounds, that’s a big deal).”

If I think I weigh 110 pounds and step on the scale and notice it says 115, that’s a big deal.

If you have congestive heart failure and have a weight gain of 3 pounds in one day, or 5 pounds in 1 week, you are supposed to call your doctor’s office.

• Sure, so lots of measurements have even 3 sig figs, my point still stands, people aren’t used to the idea that maybe you’d do a poll and estimate 64% probability of a win, and you’d really mean somewhere between 60 and 70

Giving more sig figs and then also giving lots of estimates seems to me to be better than just rounding to say 60 and then when your estimate bumps from 64.3 to 65.5 you bump from 60 to 70! wow that’s a big jump!

I’d rather see something like “we estimate that the probability of a win is: 62.4, 66.3, 61.1, 68.3, 65.2,….” bouncing around on screen every few seconds. This is better than rounding off to 1 sig fig in my opinion.

• Also, regarding the specifics of weight gain, a lot depends on the situation. A disposable 1 liter bottled water weighs about 2 pounds. I drink one of those in about 2 minutes after being out in the SoCal sun for a while. This is a weight gain rate of 1440 lbs/day ;-)

When I was in college in Iowa I’d walk to the student union in -25 F weather, when I arrived at the student union it’d be heated to about 80F a temperature change of 105F, I’d rapidly strip off about 5 pounds of clothes in a minute or so.

• Martha (Smith) says:

If you had congestive heart failure, you could be required to limit your daily fluid intake to 1.5 liters, even if it’s really hot out — and if you were not good at meeting that limit, you’d be prescribed a diuretic to reduce any net fluid gain.

• Toby says:

Andrew:

What about significant digits? If the data are that precise, then I don’t see what the fuss is about. If you have a poll with N = 1,000 and 560 vote Hillary and 440 vote Trump, then I see no problem with reporting 76.4 % as the point estimate of the probability that Hillary will win. All that you’re doing when you round it to 76 % is picking a different point on the number line. The less numerate reader will not distinguish between 76.0 and 76 %, and the more numerate reader will realize that it’s a point estimate.

• D. Stephen Voss says:

This strikes me as a communication question more than a statistical one. Too many digits become unwieldy, but too few become a potential source of confusion.

If I refer to Silver’s 64.9% or 65.3% Clinton forecast, and someone else sees that number on her screen as well, then she’ll figure we’re looking at the same thing. If I refer to 64.9% and she sees 65.3%, it will alert us that we’re not on the same page. Merely saying 65% requires more-elaborated communication in the first instance and risks error in the second.

The extra digit is an inexpensive way to reduce ambiguity in communication.

• Andrew says:

Stephen:

I don’t think those extra digits are so inexpensive! Attention is not cheap. My preferred way to reduce ambiguity in communication is through transparency in data and methods as is done by Kremp.

3. Daniel Hawkins says:

As I’ve mentioned before, how have you quantified the extent to which voter intention contaminates voter identification? In the Slate article, you said:

“Again, not only is party identification a very stable variable, but most of these polls ask party identification at the end of the poll, far away from voter intention, to avoid any contamination by how the respondents answer voter intention.”

But I fail to see how putting the ID question at the end of the poll would ameliorate *any* contamination, as you state. It seems quite reasonable that voters who are leaners or independents are likely to have a sincere change in how they identify, based on the latest scandal. Former Republican leaners may have identified as independents after the leaked Trump tapes from 2005. And former independents may have switched to Democrat leaners. And vice versa when there is a flare up over Clinton emails.

How have you quantified the effect of the party identification boundaries shifting slightly, as a reflection of voter intention? It seems odd to assume that there is *no* effect, which is what you’re doing.

I do agree that voter nonresponse is likely a large portion of the change in polls, but based on what you’ve written, I don’t think you’ve sufficiently ruled out that there are real changes to the party ID question, as a proxy for how people are feeling about the candidates. By setting that effect to zero, you risk overcompensating for differential nonresponse, and fail to capture actual swings in public opinion.

Party ID may be stable long term, but short term fluctuations (i.e. real fluctuations, not just differential nonresponse) could have significant impacts on elections.

• Andrew says:

Daniel:

We discuss this in our paper. In panel surveys such as what we did on the Xbox you can poststratify by party ID and previous-vote measures that were taken at the beginning of the series, and you see that only a very small proportion of voters are changing their opinions.

• Rahul says:

+1

This is exactly the part that I never understood.

• elin says:

You don’t start reporting you are a different religion just because your religion has a scandal. You report no religion or you don’t report a religion. Identities are just not that fluid.

Even the changers Andrew found could be measurement error, just like possibly some people who change genders really change genders but sometimes people click on the wrong circle.

• Curious says:

I think your assumption is a bit strong. Why would you think there would not be anyone who would swing from one extreme to the other when there are certainly people who tend towards that type of thinking and behavior? People who are described as “black & white thinkers” who dismiss shades of gray might be likely to change their reported affiliation from one category to another skipping the ‘none of the above’ option.

• Rahul says:

So, it’s an “assumption” not a “observation”?

We take it axiomatically that any changes must be differential non-response & not a fundamental opinion change?

• elin says:

There’s test retest data from panel studies, as Andrew has said, but there is also broad data on the importance of central identities of all kinds. Like I said, some people do change their reported gender identities and that is very real, but some people also make data entry mistakes. Some people experience religious conversion. But more people just fade out of their childhood religious identities.

• Rahul says:

True but is it unreasonable to assume that political identity is more fluid than gender or religion?

• Elin says:

It would certainly be an interesting study to do, given third party and independent options and how US states differ in term of open and closed primaries. And they might not be uncorrelated, either. Of course the shift of African American voters from Republican to Democratic and of Southern whites from Democratic to Republican is the obvious set of examples to look at, but those were not casual individual decisions and took many years to play out. It will be extremely interesting to watch how this election plays out in that respect since you have many high visibility Republicans say they are voting top of the ticket Democratic but the rest Republican, obviously in that case PartyID would still be Republican. What happens after that, we shall see.

4. Jim says:

Andrew – I think I understand what you are trying to say in this post, but it is written in a way that I don’t think makes sense.

You write that “small differences in the forecast map to apparently large differences in probabilities.” I don’t think this is right at all. If Nate Silver had a 1% higher overall forecast, that would NOT move his probability up from 65% to 85%. The differences are the assumptions of the model (e.g. interstate correlation of polling error across states).

Your post makes it sound like the difference in probabilities is driven by the fact that small differences in overall forecasts can lead to big swings in probability. This simply isn’t the case.

• Andrew says:

Jim:

Different models differ in all sorts of ways but a big difference here is forecast vote share.

Just to illustrate, here’s a quick calculation in R computing win probabilities from a normal distribution a point estimate of 51.5% or 52.5% of the two-party vote, with a forecast standard deviation of 3 percentage points:

> 1 - pnorm(.5, .515, .03)
[1] 0.69
> 1 - pnorm(.5, .525, .03)
[1] 0.80


So, here, a 1% shift in overall forecast leads to a shift in probability from 69% to 80%. Not quite from 65% to 85%, so I agree there’s other stuff going on; I assume that Nate is using a higher predictive uncertainty for the national swing, which will give him higher forecast standard errors for all aspects of the election outcome and will bring his probabilities close to 50%.

All the models that operate at the state level have correlations between states: you need to because almost all the uncertainty here is driven by uncertainty in the national swing. But in any case my point is that small differences in the forecast lead to apparently large differences in win probability.

• Dalton says:

My impression was that a major reason the 538 win percentage is lower is because the variance is greater. The Clinton win percentage is directly calculated as: 1 – Pr(Electoral Vote < 270). Most of the prediction sites display histograms of the simulations from their models. These histograms consistently show a more diffuse distribution for Nate's forecast. So Nate's methods have the highest Pr(EV < 270) but also allow for a greater probability of a Clinton landslide. I think this is probably more of a cause of the different win percentages in the different forecasts than a small shift in the central tendency. Sam Wang who has the highest probability of a Clinton win also has the narrowest range of possible outcomes for electoral college votes and Nate (lowest probability of a Clinton win) has the widest.

Silver has written in a few places that his model uses a wider=tailed distribution than many of the other models (I think he said he used a t-distribution while other models used a normal), and this makes his forecast more “conservative” in the sense that his probabilities will be closer to 50%. He argues that this is a good idea since there are enough unknown unknowns running around that we want a wider probability distribution.

AS for your original claim–my favorite way to think about this comes from a Yglesias column a while back, where he suggested that Silver’s big innovation isn’t mathematically modelling win probabilities–which is a pretty standard statistical exercise–but rather figuring out how to package this as news that people will be interested in consuming. And the extra digit is part of that.

When the 538 crew talks about their projections, they describe them as “About two to one” or whatever. But no one would take them seriously if they published a model whose output was “about 2-1”, and also no one would refresh the page 20 times a day. I think it’s in the same category as the baseball statistics about “this guy is 3-for-4 against this pitcher this season,” which is statistically meaningless but gives you something ot talk about during what would otherwise be dead time.a

5. TeddyVienna says:

Not really his “own” site — he’s within the ESPN family. Not saying he’s sensationalizing the race, but there is a temptation to make things more dramatic than they really are. Others less concerned about traffic peg the race at 98-99 percent.

6. anon says:

> So, argue about these different forecasts all you want, but from the standpoint of evidence they’re not nearly as different as they look on the probability scale.

Doesn’t that mean at least one of those modellers must have really screwed up the mapping of the evidence into probabilities? 65% and 85% are very different probabilities.

The entire point of computing the probability of an event as opposed to being content with point estimates seems to me that probability can quantify the extent to which the information we considers supports a claim.

If I knew only the point estimates for Clinton and Trump and its 48.5% and 45.4% or whatever, my immediate question would be: What’s the distribution of the difference, in particular, what’s the probability that C>T?

If as you say,

> That’s right: Nate and the others differ by about 1% in their forecasts of Clinton’s vote share. 1% isn’t a lot, it’s well within any margin of error even after you’ve averaged tons of polls, because nonsampling error doesn’t average to zero.

then this source of uncertainty should be incorporated and presumably lead to a more disperse outcome distribution and thus less certainty as to who will win.

• Andrew says:

Anon:

I don’t think anyone is screwing up. Yes, 65% != 85% but the difference between these forecasts depends on essentially untestable assumptions about variation in the national swing. Different assumps give different conclusions, it’s a ssimple as that.

• elin says:

I think that the point of computing the probability of an event is that the outcome is binary. Also that if there are 3 or more candidates in an election that does not require a plurality predicting the percent for a given candidate is not necessarily going to tell you if that candidate wins, which is what we really care about, especially if we are betting.

7. Doug Hess says:

8. Mark Palko says:

For me, the multiple daily updates at 538 suggest false precision more than the “.2” does. They had at least two since I did my Nevada post late last night. I don’t think the situation changed that much over a Saturday night (Of course, we did gain an hour).

• Cliff AB says:

The classic statistician’s dilemma: do you report what the customer is interested in or what they should be interested in?

• Toby says:

Mark:

If there is new information that results in an update, then something has changed that much. How much time has passed seems irrelevant.

• Mark Palko says:

Toby,

It’s not the amount of time; it’s the frequency. If there’s important information, you should update your forecast immediately, even if your last update was five minutes ago, but 538 updates at a rate that doesn’t seem to be justified by the flow of information. If that’s the case, the estimates aren’t actually being refreshed; they are simply being made to look fresher.

• Toby says:

What do you mean by fresher? If they are different it seems that there must new information. Or do they use a random number generator as an input?

• Mark Palko says:

1. “Fresher” in the sense of having been updated more recently. When comparing, for example, 538 and the Upshot, the casual reader will probably give more weight to the fresher update. It may also have an SEO advantage.

2. “If they are different” is not a very useful standard. The updates of the past 24 hours did change the forecast, but only from 64.7% to 64.9%. I think there’s a good chance that the new information was fairly trivial.

But “Nothing has really changed” is itself valuable information. The forecast updates every time they get new polls and add them in; if there are new polls there is new information. If nothing else changes, that means the new information is very similar to the old information. But “nothing has changed in the past 24 hours/1 hour/20 minutes” is itself information.

• Mark Palko says:

No, nothing changes means either “the new information is very similar to the old information” OR the new information is trivial and is given little weight by the model. My concern is with the second case.

• D.O. says:

I think, they are updating every time a new poll comes in. Maybe it is too often, but it is a fairly objective procedure.

9. Shravan says:

One thing I don’t understand is, in the twitterstorm with Huffpost, Nate SIlver said something to the effect that they publicly justify their weights, which suggests open source. But Andrew said in his Slate piece on the open source Stan model that Nate has a “secret sauce”. Both of those statements cannot be true.

• anon says:

538’s methodology section doesn’t give a complete mathematical description of what they do with the polls or what the model looks like, and they don’t provide the code. It’s not open source.

• Shravan says:

What does Nate Silver have to gain by hiding stuff?

• Andrew says:

Shravan:

I wouldn’t call it “hiding stuff” to not post one’s model and code. To call it “hiding” seems to imply that everything is open and available by default and that some special effort would be required to hide it. Really, though, the default is for people to not describe exactly what they’re doing, and to not share code. That’s too bad, but Nate is following standard practice here by putting up results without providing the materials by which others could replicate it.

Short-term, it’s harder to put all your code and methods out there, as this puts some pressure on you to write things up clearly, clean up your code so it can be run by others, etc. Kremp did this but I think it took a bit of effort on his part.

There’s also a business reason, or at least a perceived business reason, for people not to share methods and codes. Maybe Nate and his employers don’t want someone to start a copycat site. It’s easy for me to say how great open-source is; I have a salary, and our work on Stan is supported by various research grants. Nate and his colleagues are living in the commercial media world, and maybe they feel like it’s a selling point that their method has secrets.

Longer term, I think openness is good because it can allow outsiders to find problems in your code and methods so you can improve. But Nate hasn’t seemed to be so interested in getting constructive comments from outsiders. He performs self-evaluation when he’s been wrong, and that’s good, but I can’t recall any examples where he thanked others for finding problems with what he’s doing.

One way I’ve tried to align the short and long-term incentives is by publicly celebrating Kremp’s open-source efforts.

• Shravan says:

So he monetizes his website and predictions? (Sorry, I’m like you, with tenure and with a salary, I have nothing to worry about when I release my data+code). I was pretty frustrated when I read his book that he would not talk about the details and provide accompanying code.

I really admire this other guy who’s laying it all out publicly.

10. Olav says:

Sam Wang has Clinton’s win probability at >99 percent. That seems very different from Silver’s forecast

• Paul Alper says:

Below is a link to Sam Wang’s discussion of his criticism of both the Huffington Post and fivethirtyeight:

http://election.princeton.edu/2016/11/06/is-99-a-reasonable-probability/#more-18522

“the Huffington Post claim that FiveThirtyEight is biased toward Trump is probably wrong. It’s not that they like Trump – it’s that they are biased away from the frontrunner, whoever that is at any given moment. And this year, the frontrunner happens to be Hillary Clinton.

And then there is the question of why the FiveThirtyEight forecast has been so volatile. This may have to do with their use of national polls to compensate for the slowness of state polls to arrive. Because state opinion only correlates partially with national opinion, there is a risk of overcorrection. Think of it as oversteering a boat or a car.”

Wang further claim his

“approach has multiple advantages, not least of which is that it automatically sorts out uncorrelated and correlated changes between states. As the snapshot changes from day to day, unrelated fluctuations between states (such as random sampling error) get averaged out. At the same time, if a change is correlated among states, the whole snapshot moves.

The snapshot gets converted to a Meta-Margin, which is defined as how far all polls would have to move, in the same direction, to create a perfect toss-up. The Meta-Margin is great because it has units that we can all understand: a percentage lead. At the moment, the Meta-Margin is Clinton +2.6%.”

• Andrew says:

Paul:

It’s all dominated by Kremp’s model, as far as I’m concerned. The beauty of Bayesian inference is that you don’t have to think about things like oversteering or meta-margins; you just set up your model and fit it to data.

11. Drew says:

It seems that the mean at this point isn’t relevant since most people have it reasonably close. If you’re pricing a basket of digital options this close to “maturity”, what matters is the final variance and the joint movement of each of the “stocks” (states). What I guess looking at Nate’s graph is he had a lot less variance earlier and plugged in a lot more towards the end, which is why he was so aggressive earlier, and is so conservative now. Maybe that’s actually the right thing to do, not sure why everyone is so concerned with where the spot value of the poll is, when its reasonably close to 50% for everyone.

12. Terry says:

The big question in my mind is whether these probabilities are adequately acknowledging model risk. For instance, uncertainty about the following should be factored in:

As phone technology changes, are polls accurately sampling actual voters?

Hillary has much more money to get out the vote. How much will that increase her vote? Respondents who say they will vote for Trump, but don’t make it to the polls because they don’t have a ride or because they aren’t exhorted enough aren’t actually voters.

How big is the “shy Trump voter” effect?

Nate Silver seems to be recognizing this when he says “there are enough unknown unknowns running around that we want a wider probability distribution”. But is he still underestimating model risk?

I don’t know the answer, but I suspect pollsters are insufficiently humble about the magnitude of the flaws in their models.

• Andrew says:

Terry:

Yes, I think it’s definitely important to add a factor into the model to allow for nonsampling errors of various sources, as well as the possibility of genuine last-minute shifts in political opinions.

13. Given that my quantity of interest is the probabiltiy of a candidate winning, 66% vs. 87% seems like a huge disagreement. It’ll have a rather large effect on any decision-theoretic analysis. Not being a political scientist or demographer, I don’t really care about vote share.

So my question is: what should the authors have done rather than reporting a 66% or 87% chance of Clinton winning? Evaluating the event probability is easy:

$latex \mbox{Pr}[\mbox{clinton wins}] = \int_{\theta} \mbox{Pr}[\mbox{clinton wins} \, | \, \theta] \, p(\theta \, | \, y) \, \mbox{d}\theta$

where $latex y$ is data and $latex \theta$ are model parameters. But that only gives you an event probability estimate, like 66%, not any kind of uncertainty in the event estimate.

• Andrew says:

Bob:

I agree that 66% vs. 87% is a disagreement; my point is that this disagreement comes from essentially uncheckable aspects of the model.

I think it’s fine that Silver, Kremp, etc., give win probabilities. All they can really do is state their assumptions, say where their data came from, and give their conclusions. It just turns out that this particular conclusion depends a bit on parameters which are inherently difficult to estimate given that presidential elections don’t happen very often.

• Andrew says:

P.S. You write, “my quantity of interest is the probability of a candidate winning.” Strictly speaking (using BDA terminology), the qoi is the event that your candidate wins, and the probability of that event is a posterior summary of the qoi.

• elin says:

The thing is, what these election models are doing is attempting to predict the probability of a single event, like the next coin flip. In Silver’s case especially he is saying here is how much payoff of the bet on Clinton will be if you bet right now. Right now it is much less than even money. This is really different than predicting what the probability is for 1000 observations with the same set of parameter variables. In this case, the percent itself is the level of uncertainty. The farther away it is from 1 or 0 the more uncertain the model is in its prediction that Clinton will win. We might be interesting in the probability of winning because we are interested in statistics or political science or sociology or whatever. Silver is interested in laying the correct odds of a dichotomous event.

• anon says:

>We might be interesting in the probability of winning because we are interested in statistics or political science or sociology or whatever. Silver is interested in laying the correct odds of a dichotomous event.

What is the difference between the “correct odds” and the probability of winning? Seems to me that these are the exact same thing (although calling odds or a probability correct can only ever be with respect to a state of knowledge. Laplace’s demons correct odds are 1 if Clinton wins and 0 otherwise).

• elin says:

Silver comes at this from a sports background. He wants to know how much the payoff of a bet that Trump wins today should be. To a bookie slightly wrong odds or probability or however you want to present the same information about how much a winning bet pays has real financial consequences. If the odds are too high and the low likelihood event happens you have to pay out a lot. On the other hand, high odds will attract a lot of bettors whose money you would like. Even though you might select from the same tools that is different motivation than other people have for their models or choosing between modeling approaches.

14. David Blake says:

How is saying that you way 143.217 pounds innumerate if you weigh…143.217 pounds? Maybe you will weigh a bit more or less than that in a minute and it is probably more information than I really want. But it is not innumerate.

• Andrew says:

David:

I think it’s fine to say, “Just now I got on a scale and it gave the reading 143.217 pounds,” but I think it’s in error to say “My weight is 143.217 pounds,” for two reasons. First, the scale is not so accurate, so 143.217 is not even “my weight” at the time I stepped on the scale, it’s just the reading that the scale gave. Second, my weight varies by much more than that just based on drinking a glass of water or whatever or from day to day, so it would not be correct to call this number “my weight.” If you got on some sort of hyper-precise weighing instrument that could really weigh you to an accuracy of 1/1000 pound, then I wouldn’t object to your saying, “My weight at this moment is 143.217 pounds.”

Search this blog for kangaroo and feather for more on this issue.

In the election probability example, my point was that a change in win probability of 0.1% corresponds to a 0.004 percentage point share of the two-party vote. That’s far far below the noise in any of our measuring instruments.

15. zbicyclist says:

This morning, The Upshot at the NYTimes summarized the predictions: Their model gives Trump a 16% chance. 538 gives Trump a 36% chance, and Huffington Post gives Trump less than a 2% chance.

But the Huffington Post http://elections.huffingtonpost.com/2016/forecast/president was giving Tennessee’s 11 electoral votes definitely to Hillary, which looked like a mistake. I was one of many commenters to question this unexpected swing of 22 electoral votes. They’ve now changed Tennessee to the (expected) definite Trump column, but still left the odds at less than 2%.

I see from Facebook that there are replies to my comment, but there are too many comments there to wade through and find out what was said. I’d love to know if they somehow had Tennessee reversed when they did that low percentage calculation.

16. Chris G says:

> No big deal but I’m 66.24326382364829790019279981238980080123% sure he’s doing the wrong thing.

Let see, that’s 1 part in 10^40 or, roughly, 2^133. Wow, octuple precision. I’m impressed.

17. Paul Alper says:

This critique of Nate Silver’s methodology ought to provoke some response:

http://www.huffingtonpost.com/entry/im-a-stats-prof-heres-why-nate-silvers-model-was-all-over-the-place_us_582238dce4b0d9ce6fbf69b6

“Quite simply, his modeling approach is overly complicated and baroque. It has so many moving parts that it is like an animal with no bones. That is why it then has so many places where he has to impose his (hopefully unbiased) views. The problem with this is that he could push the results around quite a bit if he wanted to. That doesn’t mean he is purposely rigging the model; and, I don’t suspect he is.”