Right, Daniel, I absolutely agree. Tweaking until the prior predictives look sensible (discarding the polling data) seems like a reasonable process of constructing a prior compatible with prior knowledge. Tweaking the posterior predictives — with anything allowed! — is wild.

]]>I think the point about it being worthwhile to tweak the *prior* rather than the posterior is maybe worthwhile. It’s perfectly reasonable for example to construct a prior by initially doing something kind of vague, and then sticking in some “fake data” to constrain the prior down to something you like, and then go from there adding in the real data. This is worthwhile to think about.

]]>Andrew (other):

You write: “Just use a computer to do lots of tweaking until you get the desired conclusion, e.g. candidate X wins by Y margin.” But we’re not just spitting out one number! We’re spitting out a 50-dimensional posterior distribution, which is mostly summarized by 50 estimates and a covariance matrix.

The answer to, “what is allowed and what is forbidden,” is that anything is allowed. If you don’t like it, that’s fine, but there aren’t really any alternatives, at least not yet. This sort of opinion forecasting is not a mature science. There are too many sources of nonsampling error floating around for this to be done automatically. Or, to put it another way, there are ways it could be done automatically, but I’d have no good reason to trust the results.

Or, to put it yet another way: you might not trust a model that my colleagues and I have built after examining its predictions and checking that they are reasonable. That’s fine. But then you really really really shouldn’t trust a model whose predictions *haven’t* been checked in this way!

I’m not sure what you mean. The fact that it’s not a one-to-one mapping from assumptions to conclusions means that the tweaking of assumptions required to affect the desired change in the conclusion of interest could be complicated.

But that just seems like a computational problem. Just use a computer to do lots of tweaking until you get the desired conclusion, e.g. candidate X wins by Y margin.

That computational problem doesn’t seem relevant to the question of methodology: what are the rules? where does the tweaking stop? what is allowed and what is forbidden?

I would feel slightly more comfortable if you were tweaking the *prior* predictives rather than the posterior ones, but I don’t think that’s what you are doing. It would also be very interesting to know the *prior* predictive for the outcome of the election, before conditioning on any polling data.

]]>Andrew (other):

Your comment would be correct if our model were doing nothing but forecasting a single scalar parameter. But that’s not the situation we’re in. We’re predicting all 50 states at once, which allows us to look at things like the predictive probability that Trump wins the national election conditional on him winning California, etc. It’s not a simple one-to-one mapping from assumptions to conclusions.

]]>” I mean, what are the rules? Where does the tweaking stop?”

Yes, I guess that’s my question. Perhaps we need a statement and paper by a group of eminent statisticians with guidelines for tweakers.

]]>See the Borges story: On Rigor in Science.

]]>Going even further, I can’t see why we don’t just look at the end result – it predicts X to win by that much? – and judge whether we find it reasonable, and if not, tweak the model until we get what we wanted. I mean, what are the rules? Where does the tweaking stop? What features in the predictions are you allowed to remove through tweaking the model?

]]>“based on…scientific knowledge “

If that’s what it’s based on, great. But intuition isn’t scientific knowledge. And I’m not even saying intuition is wrong. It’s just a lot harder to defend.

“I trust their judgments. “

I trust Andrew too. I trust that Andrew and all the people he works with are doing the best that they can. It doesn’t mean they’re always right. It’s not always clear what’s the right thing to do.

]]>“So you can reasonably estimate the probability of a candidate winning just by reviewing the raw polling data?”

Yep. If the polls are heavily in favor of Biden, which they are now, I bet on Biden. If they were heavily in favor of Trump, I would bet on Trump. If they were too close to call, I wouldn’t bet: exact same thing I’d do if there were model with purportedly quantitative probabilities.

“Or maybe you’re saying we can’t possibly make a reasonable prediction with a statistical model because there is too much uncertainty? “

I’m saying that all your doing is expending a pile of work to regurgitate what we already know from polls: Biden is the likely winner.

]]>So you can reasonably estimate the probability of a candidate winning just by reviewing the raw polling data? You should write out how to do that so others can understand, but don’t use any “model” though because we are trying to avoid that.

Or maybe you’re saying we can’t possibly make a reasonable prediction with a statistical model because there is too much uncertainty? What should we do then? Listen to experts, but only the ones who don’t use advanced statistical analysis?

]]>I think the hurricane example is actually reasonable too.

I mean, if I were forecasting hurricanes, it would seem kind of dumb, but that’s because I have no business forecasting hurricanes. But for a meteorologist to do that, based on their own scientific knowledge and as part of a research community, and a commitment to respecting the data to the extent it is reliable, it seems very defensible.

I think Gelman/Heidemanns as well as the FiveThirtyEight team know a lot about polls and elections and I trust their judgments. (Although, see Nate Silver’s 2016 primary failure for a way this can go wrong.)

I guess this only works if the modeler’s subjective judgment is good, and they are working in good faith and not fooling themselves.

]]>The Formula That Killed Wall Street

]]>“Here are two naive interpretations of the polls”

But no one would advocate either, right? The reality is that usually “the polls” means N *poles* each with n *respondents*, where N is in the hundreds if not thousands including all state polls over the year leading up to the election. People are already doing their own aggregating from many many polls over a period of a year and usually the poles are right.

It makes a lot of sense to “understand the implications” of a model. What’s not as clear is how those implications should be judged – what constitutes a “right” or “wrong” outcome of the model – and whether or how the model should be changed to reflect a judgement call, particularly when that judgement call is about an abstract probability in the tail of a model distribution.

In the case of elections it’s double academic, since a) these seem to be tweaks in the noise; and b) they won’t change the outcome. But OTOH again the judgement call scale is a smooth gradation so I can see people getting increasingly comfortable tweaking on decreasingly certain judgement situations, then applying that to something that matters.

]]>(aka ilikegum@fastmail.com)

“I think the case for election models is pretty clear if you look at 2016. “

I would say just the opposite. Models did no better than polls. I guess you’re going to argue that because Nate had a moderate chance of Trump winning that models were better, but that’s a low viscosity argument. No one gave that any credit and really no one knows what that even means. Most people interpreted that as “Clinton will win”. If the models had given Trump >=45% chance of winning while the polls were giving hands-down to Clinton, I might buy your argument.

]]>I think the case for election models is pretty clear if you look at 2016. Conventional wisdom was that Clinton would definitely win, but 538 predicted around a 30% chance of Trump winning. Correlated polling errors in a few states can cause election outcomes that are not easy to predict from just raw polling data. You need to aggregate the polling data, quantify the historical accuracy of polls, incorporate your prior knowledge about election fundamentals, and then simulate all the “what if” scenarios. If you formalize the process in a model, then you can automatically incorporate new data in near real time, track how the race has changed over time, and quantitatively compare/contrast with other forecasts.

]]>“Reality is more complicated than any model we can build—especially given all the shortcuts we take when modeling.”

This observation deserves a name, which deserves to be in the lexicon.

(Any suggestions? Maybe just Reality?)

Ilikegum:

The polls are data. They have value, but they need to be interpreted. Here are two naive interpretations of the polls that are wrong:

1. The polls should be taken literally, as if a poll with N respondents is the equivalent of a random sample of N balls from an urn.

2. The polls should be completely ignored as they provide no relevant information about the election.

There’s good evidence, both empirical and theoretical, against interpretation #1 and interpretation #2. Reasonable people all agree that we need something in between. A model (or, more generally, a forecasting method) is a way of getting something in between #1 and #2.

As for hurricanes: I’ve never tried to forecast them, but my colleagues and I have modeled all sorts of other things, and, yes, our workflow does involve going back and forth between modeling and examining the implications of our models.

]]>“It seems that the election predictions are being treated as just the subjective judgment of the well-informed human experts who made it (which seems like the right way to look at it).”

I find that a troubling aspect of this approach. It’s unclear which part of the model is mechanical and which part is a personal opinion. You can fix mechanical problems. You can’t fix personal opinion: no one knows if it’s right or wrong.

All well and good for election “modelling”: the model isn’t adding much if any value to the polls. The polls in this race are clear. But if they weren’t clear – if it was ±2% – would the model add any certainty? Unlikely – again, unless the certainty was from some other obvious source, for example that if was clear one candidate would carry all the large states.

So that’s actually a good question: what does this or Nate’s model tell us that we can’t already see from the polls?

What if you were using this approach to forecast, say, hurricanes? Would you be going “well, jeez, in the tails it says one might occur in Maine but no, really I don’t think so so I’m going to tweak it out”?

]]>Yuling:

I’m not sure, but I’ll say one thing: Our state correlations are a hack, constructed by all sorts of weird manipulations from past correlation matrix in order to get something that seemed reasonable to us and that gave reasonable results when used with past elections.

]]>Thanks for taking the time to respond.

I agree all models are flawed and most code has bugs. Making your code open source helps find problems, but if you’re paying well qualified people a salary to do stuff like code reviews and QA then that can work well too.

You may be right that 538’s model has serious flaws, I was mostly taking issue with describing it as “clearly flawed”. In your series of blog posts it didn’t seem that clear cut to me as a reader, but maybe I lost the thread.

]]>N:

Yes, all things are possible. Nobody’s saying these probabilities should be exactly zero. It’s all about the numbers. So, yes, “to a lesser extent.” You say, “It is not difficult for me to imagine scenarios where rare state specific events (e.g., scandal, natural disaster, denying federal disaster aid to specific state) shift the outcome in a particular state largely independent of the national outcome.” I could see this causing a shift of a couple percentage points, but not 10 percentage points. The only such examples I can think of historically are third-party challenges, and there are no state-specific third-party challenges in this election. So, again, the question is not about theoretical possibility (i.e., probability greater than 0), it’s about the probability.

Yet another way to say it is that these models are constructed by humans and are imperfect. I would be stunned if the Economist’s and Fivethirtyeight’s predictions did *not* have serious flaws somewhere, if you look hard enough. How could it be otherwise? Our default assumption should be that the predictions are flawed, and it makes sense that any time we look carefully at some aspect of the predictions that haven’t been carefully studied before, that we uncover issues.

But it’s been interesting to watch this reflective process, of looking for odd predictions made by the model. It seems that when the model makes one of these weird predictions, the response is “that’s not what I was trying to say, let’s fix the model”. It seems that the election predictions are being treated as just the subjective judgment of the well-informed human experts who made it (which seems like the right way to look at it).

For Bayesians I guess the theoretically ideal thing to do would just be to write down all the probability distributions, treating each as the modeler’s subjective best judgment, and making sure they are consistent. Of course this is computationally hopeless even if everything is discrete.

So then maybe one purpose of a statistical model is simply as a tool that makes it possible to express these judgments at all. The choice of model and priors would just be a more compact, tractable alternative to telling everyone the full distribution. Posterior predictive checks would test whether the model is expressing the opinions that it should.

All this is very influenced by the ideas of Bayesian practitioners such as the main author of this blog. However, the perspective of statistical modeling as just a language for compactly expressing beliefs is one I have not run into before as far as I know, at least not stated in that exact way. Would be very interested in reading more stuff from this perspective — I would guess people have thought in these terms before.

What got me thinking about it was the use of “bidding languages” in combinatorial auctions, where you have to come up with some way for people to express preferences that doesn’t involve setting a price on all 2^N combinations of items for sale.

]]>movements of 2 financial instruments are easy to estimate for reasonably typical market conditions.

But one often wants to know the correlation for very large moves. That’s much harder.

A very experienced trader one said to me that when markets go wild all correlations -> + or – 1.

This is why a portfolio of “independent” diversified securities does not give the protection one would wish if

there is market meltdown.