“Quite simply, his modeling approach is overly complicated and baroque. It has so many moving parts that it is like an animal with no bones. That is why it then has so many places where he has to impose his (hopefully unbiased) views. The problem with this is that he could push the results around quite a bit if he wanted to. That doesn’t mean he is purposely rigging the model; and, I don’t suspect he is.”

]]>It would certainly be an interesting study to do, given third party and independent options and how US states differ in term of open and closed primaries. And they might not be uncorrelated, either. Of course the shift of African American voters from Republican to Democratic and of Southern whites from Democratic to Republican is the obvious set of examples to look at, but those were not casual individual decisions and took many years to play out. It will be extremely interesting to watch how this election plays out in that respect since you have many high visibility Republicans say they are voting top of the ticket Democratic but the rest Republican, obviously in that case PartyID would still be Republican. What happens after that, we shall see.

]]>True but is it unreasonable to assume that political identity is more fluid than gender or religion?

]]>There’s test retest data from panel studies, as Andrew has said, but there is also broad data on the importance of central identities of all kinds. Like I said, some people do change their reported gender identities and that is very real, but some people also make data entry mistakes. Some people experience religious conversion. But more people just fade out of their childhood religious identities.

]]>So, it’s an “assumption” not a “observation”?

We take it axiomatically that any changes must be differential non-response & not a fundamental opinion change?

]]>Silver comes at this from a sports background. He wants to know how much the payoff of a bet that Trump wins today should be. To a bookie slightly wrong odds or probability or however you want to present the same information about how much a winning bet pays has real financial consequences. If the odds are too high and the low likelihood event happens you have to pay out a lot. On the other hand, high odds will attract a lot of bettors whose money you would like. Even though you might select from the same tools that is different motivation than other people have for their models or choosing between modeling approaches.

]]>>We might be interesting in the probability of winning because we are interested in statistics or political science or sociology or whatever. Silver is interested in laying the correct odds of a dichotomous event.

What is the difference between the “correct odds” and the probability of winning? Seems to me that these are the exact same thing (although calling odds or a probability correct can only ever be with respect to a state of knowledge. Laplace’s demons correct odds are 1 if Clinton wins and 0 otherwise).

]]>I think your assumption is a bit strong. Why would you think there would not be anyone who would swing from one extreme to the other when there are certainly people who tend towards that type of thinking and behavior? People who are described as “black & white thinkers” who dismiss shades of gray might be likely to change their reported affiliation from one category to another skipping the ‘none of the above’ option.

]]>You don’t start reporting you are a different religion just because your religion has a scandal. You report no religion or you don’t report a religion. Identities are just not that fluid.

Even the changers Andrew found could be measurement error, just like possibly some people who change genders really change genders but sometimes people click on the wrong circle.

]]>Let see, that’s 1 part in 10^40 or, roughly, 2^133. Wow, octuple precision. I’m impressed.

]]>But the Huffington Post http://elections.huffingtonpost.com/2016/forecast/president was giving Tennessee’s 11 electoral votes definitely to Hillary, which looked like a mistake. I was one of many commenters to question this unexpected swing of 22 electoral votes. They’ve now changed Tennessee to the (expected) definite Trump column, but still left the odds at less than 2%.

I see from Facebook that there are replies to my comment, but there are too many comments there to wade through and find out what was said. I’d love to know if they somehow had Tennessee reversed when they did that low percentage calculation.

]]>So he monetizes his website and predictions? (Sorry, I’m like you, with tenure and with a salary, I have nothing to worry about when I release my data+code). I was pretty frustrated when I read his book that he would not talk about the details and provide accompanying code.

I really admire this other guy who’s laying it all out publicly.

]]>The thing is, what these election models are doing is attempting to predict the probability of a single event, like the next coin flip. In Silver’s case especially he is saying here is how much payoff of the bet on Clinton will be if you bet right now. Right now it is much less than even money. This is really different than predicting what the probability is for 1000 observations with the same set of parameter variables. In this case, the percent itself is the level of uncertainty. The farther away it is from 1 or 0 the more uncertain the model is in its prediction that Clinton will win. We might be interesting in the probability of winning because we are interested in statistics or political science or sociology or whatever. Silver is interested in laying the correct odds of a dichotomous event.

]]>Paul:

It’s all dominated by Kremp’s model, as far as I’m concerned. The beauty of Bayesian inference is that you don’t have to think about things like oversteering or meta-margins; you just set up your model and fit it to data.

]]>Below is a link to Sam Wang’s discussion of his criticism of both the Huffington Post and fivethirtyeight:

http://election.princeton.edu/2016/11/06/is-99-a-reasonable-probability/#more-18522

“the Huffington Post claim that FiveThirtyEight is biased toward Trump is probably wrong. It’s not that they like Trump – it’s that they are biased away from the frontrunner, whoever that is at any given moment. And this year, the frontrunner happens to be Hillary Clinton.

And then there is the question of why the FiveThirtyEight forecast has been so volatile. This may have to do with their use of national polls to compensate for the slowness of state polls to arrive. Because state opinion only correlates partially with national opinion, there is a risk of overcorrection. Think of it as oversteering a boat or a car.”

Wang further claim his

“approach has multiple advantages, not least of which is that it automatically sorts out uncorrelated and correlated changes between states. As the snapshot changes from day to day, unrelated fluctuations between states (such as random sampling error) get averaged out. At the same time, if a change is correlated among states, the whole snapshot moves.

The snapshot gets converted to a Meta-Margin, which is defined as how far all polls would have to move, in the same direction, to create a perfect toss-up. The Meta-Margin is great because it has units that we can all understand: a percentage lead. At the moment, the Meta-Margin is Clinton +2.6%.”

]]>I think that the point of computing the probability of an event is that the outcome is binary. Also that if there are 3 or more candidates in an election that does not require a plurality predicting the percent for a given candidate is not necessarily going to tell you if that candidate wins, which is what we really care about, especially if we are betting.

]]>If the national outcome is a binary variable, Clinton or Trump, and we assume that any predicted probability for a candidate greater than .5 means that they are predicted to be the winner, have we really seen a lot of variation? That’s not my impression. Variation on size of win yes variation on probability of win, yes, but variation on who will win? Not so much. There is a reason spread bets and money line bets both exist. In the end, after the fact, the probability for this specific event will be known to be 1 or 0 so you could say that the best model is the one that is giving predicted probability closest to 1 or 0. I actually wouldn’t do that in this context (because the results of voting by millions of people are different than individual votes ), but you could. We should also remember the way the rate of change of the slope changes in a binary model depending on where you measure it, with very high slopes if you are in the middle moving toward 0 as you get to the extremes. If you have a model with predicted probability around .50 then yes it is going to appear to swing wildly when there are changes to the inputs.

]]>“small differences … can lead mathematically to apparently large differences ….” i.e. sensitivity to initial conditions, as far as the system generating the forecast probability.

]]>David:

I think it’s fine to say, “Just now I got on a scale and it gave the reading 143.217 pounds,” but I think it’s in error to say “My weight is 143.217 pounds,” for two reasons. First, the scale is not so accurate, so 143.217 is not even “my weight” at the time I stepped on the scale, it’s just the reading that the scale gave. Second, my weight varies by much more than that just based on drinking a glass of water or whatever or from day to day, so it would not be correct to call this number “my weight.” If you got on some sort of hyper-precise weighing instrument that could really weigh you to an accuracy of 1/1000 pound, then I wouldn’t object to your saying, “My weight at this moment is 143.217 pounds.”

Search this blog for kangaroo and feather for more on this issue.

In the election probability example, my point was that a change in win probability of 0.1% corresponds to a 0.004 percentage point share of the two-party vote. That’s far far below the noise in any of our measuring instruments.

]]>Shravan:

I wouldn’t call it “hiding stuff” to not post one’s model and code. To call it “hiding” seems to imply that everything is open and available by default and that some special effort would be required to hide it. Really, though, the default is for people to not describe exactly what they’re doing, and to not share code. That’s too bad, but Nate is following standard practice here by putting up results without providing the materials by which others could replicate it.

Short-term, it’s harder to put all your code and methods out there, as this puts some pressure on you to write things up clearly, clean up your code so it can be run by others, etc. Kremp did this but I think it took a bit of effort on his part.

There’s also a business reason, or at least a perceived business reason, for people not to share methods and codes. Maybe Nate and his employers don’t want someone to start a copycat site. It’s easy for me to say how great open-source is; I have a salary, and our work on Stan is supported by various research grants. Nate and his colleagues are living in the commercial media world, and maybe they feel like it’s a selling point that their method has secrets.

Longer term, I think openness is good because it can allow outsiders to find problems in your code and methods so you can improve. But Nate hasn’t seemed to be so interested in getting constructive comments from outsiders. He performs self-evaluation when he’s been wrong, and that’s good, but I can’t recall any examples where he thanked others for finding problems with what he’s doing.

One way I’ve tried to align the short and long-term incentives is by publicly celebrating Kremp’s open-source efforts.

]]>What does Nate Silver have to gain by hiding stuff?

]]>If you had congestive heart failure, you could be required to limit your daily fluid intake to 1.5 liters, even if it’s really hot out — and if you were not good at meeting that limit, you’d be prescribed a diuretic to reduce any net fluid gain.

]]>I think, they are updating every time a new poll comes in. Maybe it is too often, but it is a fairly objective procedure.

]]>No, nothing changes means either “the new information is very similar to the old information” OR the new information is trivial and is given little weight by the model. My concern is with the second case.

]]>P.S. You write, “my quantity of interest is the probability of a candidate winning.” Strictly speaking (using BDA terminology), the qoi is *the event* that your candidate wins, and the probability of that event is a posterior summary of the qoi.

But “Nothing has really changed” is itself valuable information. The forecast updates every time they get new polls and add them in; if there are new polls there is new information. If nothing else changes, that means the new information is very similar to the old information. But “nothing has changed in the past 24 hours/1 hour/20 minutes” is itself information.

]]>Bob:

I agree that 66% vs. 87% is a disagreement; my point is that this disagreement comes from essentially uncheckable aspects of the model.

I think it’s fine that Silver, Kremp, etc., give win probabilities. All they can really do is state their assumptions, say where their data came from, and give their conclusions. It just turns out that this particular conclusion depends a bit on parameters which are inherently difficult to estimate given that presidential elections don’t happen very often.

]]>So my question is: what should the authors have done rather than reporting a 66% or 87% chance of Clinton winning? Evaluating the event probability is easy:

$latex \mbox{Pr}[\mbox{clinton wins}] = \int_{\theta} \mbox{Pr}[\mbox{clinton wins} \, | \, \theta] \, p(\theta \, | \, y) \, \mbox{d}\theta$

where $latex y$ is data and $latex \theta$ are model parameters. But that only gives you an event probability estimate, like 66%, not any kind of uncertainty in the event estimate.

]]>1. “Fresher” in the sense of having been updated more recently. When comparing, for example, 538 and the Upshot, the casual reader will probably give more weight to the fresher update. It may also have an SEO advantage.

2. “If they are different” is not a very useful standard. The updates of the past 24 hours did change the forecast, but only from 64.7% to 64.9%. I think there’s a good chance that the new information was fairly trivial.

]]>Terry:

Yes, I think it’s definitely important to add a factor into the model to allow for nonsampling errors of various sources, as well as the possibility of genuine last-minute shifts in political opinions.

]]>As phone technology changes, are polls accurately sampling actual voters?

Hillary has much more money to get out the vote. How much will that increase her vote? Respondents who say they will vote for Trump, but don’t make it to the polls because they don’t have a ride or because they aren’t exhorted enough aren’t actually voters.

How big is the “shy Trump voter” effect?

Nate Silver seems to be recognizing this when he says “there are enough unknown unknowns running around that we want a wider probability distribution”. But is he still underestimating model risk?

I don’t know the answer, but I suspect pollsters are insufficiently humble about the magnitude of the flaws in their models.

]]>What do you mean by fresher? If they are different it seems that there must new information. Or do they use a random number generator as an input?

]]>Anon:

I don’t think anyone is screwing up. Yes, 65% != 85% but the difference between these forecasts depends on essentially untestable assumptions about variation in the national swing. Different assumps give different conclusions, it’s a ssimple as that.

]]>Stephen:

I don’t think those extra digits are so inexpensive! Attention is not cheap. My preferred way to reduce ambiguity in communication is through transparency in data and methods as is done by Kremp.

]]>I was more making a recommendation about how to communicate uncertainty to the general public as an alternative to round-off rather than actually suggesting something I’d like to do with Nate’s numbers.

Kremp’s stuff is great though, I like that time-series plot a lot. I did try to do some gaussian process regression with the state-space model you posted a month or so back, Stan wasn’t quite up to handling the 20,000 element covariance matrix.

]]>This strikes me as a communication question more than a statistical one. Too many digits become unwieldy, but too few become a potential source of confusion.

If I refer to Silver’s 64.9% or 65.3% Clinton forecast, and someone else sees that number on her screen as well, then she’ll figure we’re looking at the same thing. If I refer to 64.9% and she sees 65.3%, it will alert us that we’re not on the same page. Merely saying 65% requires more-elaborated communication in the first instance and risks error in the second.

The extra digit is an inexpensive way to reduce ambiguity in communication.

]]>Also, regarding the specifics of weight gain, a lot depends on the situation. A disposable 1 liter bottled water weighs about 2 pounds. I drink one of those in about 2 minutes after being out in the SoCal sun for a while. This is a weight gain rate of 1440 lbs/day ;-)

When I was in college in Iowa I’d walk to the student union in -25 F weather, when I arrived at the student union it’d be heated to about 80F a temperature change of 105F, I’d rapidly strip off about 5 pounds of clothes in a minute or so.

]]>Sure, so lots of measurements have even 3 sig figs, my point still stands, people aren’t used to the idea that maybe you’d do a poll and estimate 64% probability of a win, and you’d really mean somewhere between 60 and 70

Giving more sig figs and then also giving lots of estimates seems to me to be better than just rounding to say 60 and then when your estimate bumps from 64.3 to 65.5 you bump from 60 to 70! wow that’s a big jump!

I’d rather see something like “we estimate that the probability of a win is: 62.4, 66.3, 61.1, 68.3, 65.2,….” bouncing around on screen every few seconds. This is better than rounding off to 1 sig fig in my opinion.

]]>Daniel said,

“People are used to things having meaningful measurements to 2 sig figs, like my weight, … (if I think that I weigh 180 pounds and step on the scale and notice now i’m 190 pounds, that’s a big deal).”

If I think I weigh 110 pounds and step on the scale and notice it says 115, that’s a big deal.

If you have congestive heart failure and have a weight gain of 3 pounds in one day, or 5 pounds in 1 week, you are supposed to call your doctor’s office.

]]>Toby,

It’s not the amount of time; it’s the frequency. If there’s important information, you should update your forecast immediately, even if your last update was five minutes ago, but 538 updates at a rate that doesn’t seem to be justified by the flow of information. If that’s the case, the estimates aren’t actually being refreshed; they are simply being made to look fresher.

]]>538’s methodology section doesn’t give a complete mathematical description of what they do with the polls or what the model looks like, and they don’t provide the code. It’s not open source.

]]>Mark:

If there is new information that results in an update, then something has changed that much. How much time has passed seems irrelevant.

]]>The classic statistician’s dilemma: do you report what the customer is interested in or what they should be interested in?

The businessman has a very easy answer to that question.

]]>Andrew:

What about significant digits? If the data are that precise, then I don’t see what the fuss is about. If you have a poll with N = 1,000 and 560 vote Hillary and 440 vote Trump, then I see no problem with reporting 76.4 % as the point estimate of the probability that Hillary will win. All that you’re doing when you round it to 76 % is picking a different point on the number line. The less numerate reader will not distinguish between 76.0 and 76 %, and the more numerate reader will realize that it’s a point estimate.

]]>Silver has written in a few places that his model uses a wider=tailed distribution than many of the other models (I think he said he used a t-distribution while other models used a normal), and this makes his forecast more “conservative” in the sense that his probabilities will be closer to 50%. He argues that this is a good idea since there are enough unknown unknowns running around that we want a wider probability distribution.

AS for your original claim–my favorite way to think about this comes from a Yglesias column a while back, where he suggested that Silver’s big innovation isn’t mathematically modelling win probabilities–which is a pretty standard statistical exercise–but rather figuring out how to package this as news that people will be interested in consuming. And the extra digit is part of that.

When the 538 crew talks about their projections, they describe them as “About two to one” or whatever. But no one would take them seriously if they published a model whose output was “about 2-1”, and also no one would refresh the page 20 times a day. I think it’s in the same category as the baseball statistics about “this guy is 3-for-4 against this pitcher this season,” which is statistically meaningless but gives you something ot talk about during what would otherwise be dead time.a

]]>