Sg:

This sounds like conservatism on Nate’s part. He trusts traditional survey weighting because it’s been done before, it has stood the test of time. The trouble is that samples are getting worse and worse, hence there’s need to adjust for more and more predictors, and classical weighting starts to fall apart. I discuss some of this in my Struggles paper.

As is often the case in statistics, there is no safe harbor.

]]>@andrew

More of Nate’s MRP comments are here:

https://youtu.be/Thf8weQSPrc?t=1668

Vavreck pushes him on his remarks at 33:00

“It feels like a lot of the MRP stuff hasn’t been backtested” seems like his main critique.

]]>All dumb questions attributable to this one.

]]>Multilevel Regression and Poststratification

basically a technique where you learn to predict the average results of various groups using regression on survey data, and then you figure out the average results of a full population by predicting using the regression equations for the *known* demographics of the whole population, rather than relying on the survey to accurately sample from every demographic group in the appropriate proportion.

]]>The pollsters seem to be taking another pasting in Australia.

]]>I think sports is a reasonable use of a statisticians time, though I think the baseball analysis has been negative for (real) football. It’s not important, but at least it’s interesting. Polling is a lot of unfalsifiable hot air and pretty charts. Plus I think it’s a negative for your politics.

]]>Jd:

There’s some sports stuff here. And tons and tons of sports-related material here. Also lots of analyses I’ve never published.

]]>“a lot of sports analysis” – where??? That golf example doesn’t count.

“It’s worse than that!” – hey now, I’d go for more sports topics on this blog:) It’s as least an important a topic as gremlins and the end of the world.

Bob:

It’s worse than that! I also do a lot of sports analysis. If you want to talk about unimportant topics that statisticians are obsessed with, sports has got to top the list.

]]>Jordan:

MRP is something you do with raw data, to adjust the sample to match the population. An example is here. Another example is here. When Nate does poll analysis, I think he uses published summaries: rather than taking raw survey responses, he relies on whatever adjustments were done by the polling organizations. I understand this decision on Nate’s part: there are lots of polls out there that will publish summaries but not release their raw data—but the point is that he wouldn’t be using MRP for most of what he does. So maybe he thinks MRP is overrated because he doesn’t use it. The thing is, there’s a lot that can be learned from analysis of survey data. MRP is a really powerful tool.

]]>Nate Silver came up with a player projection system which he named after Carmelo:

https://fivethirtyeight.com/features/how-were-predicting-nba-player-career/

In that introduction he links to an article about Anthony he wrote:

https://fivethirtyeight.com/features/carmelo-anthonys-contract-could-doom-the-knicks-to-mediocrity/

Maybe he thinks Carmelo isn’t as bad as people think but still not worth his contract? But that article is old and Carmelo’s reputation has taken multiple hits since then.

Basically he went to the Thunder and couldn’t adjust his game to be the third option, then went to the Rockets and couldn’t adjust his game to be a role player (he still thought he was a star). So maybe being a Carmelo is thinking you should be the primary option when in reality you should be the fifth or sixth line of evidence?

]]>90+-15 means 90 is the point estimate and the scale of the errors is about 15 percentage points, but the bit about the long left tail was to indicate that they weren’t symmetric and hence you couldn’t have more than 100% white.

in my area of expertise (fisheries) we have people publishing survival estimates for fish passage at dams where the point estimate (not just some portion of the uncertainty interval) is greater than 1

Obviously not Bayesian models. This is exactly the kind of stuff we’ve had discussions about every time Frequentist vs Bayesian comes up. Bayesian models can’t give even intervals that overlap with impossible regions provided you give the appropriate definition of impossible to the model.

]]>> Since you’re suggesting any election model should be closer to a coin flip based (or maybe I’m misrepresenting what you’re suggesting and you’re narrowly focused on the condition present in the 2016 polls),

No, I’m suggesting that specifically election models using data of the type being collected “these days”.

Things like “random digit dial, and most people have cell phones and ignore you” or “online polling” or whatever.

If you collect more reliable data, you could easily have predictions that *should* be something like 90/10, but it needs to be something you have strong reason to believe is truly reliable.

]]>Paul:

I did that google and none of the instances of wtf were from me, except for this post right here.

]]>So of course google

“wtf andrew gelman”

and you can see for yourself how frequent that is. But perhaps more relevant is the unnecessary and ubiquitous “like.”

https://news.nationalgeographic.com/news/is-valley-girl-speak–like–on-the-rise-/

]]>And Carmelo Anthony has become the protractor of obtuse metaphors.

]]>Daniel,

Isn’t this something we could assess retrospectively? Obviously not if we only look at the 2016 election, but suppose we took all the cases where we had some facsimile of the kind of polling data that we had in 2016. We’d probably have to extend beyond presidential elections. But we could train models on the polling data using say the methods of 538 versus a model that introduces more forms of uncertainty, versus a coin flip. Since you’re suggesting any election model should be closer to a coin flip based (or maybe I’m misrepresenting what you’re suggesting and you’re narrowly focused on the condition present in the 2016 polls), and 538 is willing to be more precise than that. Given enough contests that have pre-election polling data that satisfy our criteria, we should be able to assess which models performed the best.

In fact didn’t 538 do something like this: https://projects.fivethirtyeight.com/checking-our-work/

]]>I’m not sure what disagreement you mean. I agree with you that the state of information “Clinton has 70% chance of winning” which is approximately what 538 was saying at the time is a state of information that is based on wrong assumptions about the world, and if you persist in using it you’ll continue to be repeatedly wrong. Some of the wrong assumptions are things like

“polling methods that worked when phone calling was a very different activity will continue to be relevant and low bias today”

It could have by extreme fluke been the case that phone polling and other methods were perfectly fine and we just got some weird numbers throughout the whole of the lead-up to 2016, and if we persist in using those old polling methods we’ll do fine next time, but we knew, or should have known, that it wasn’t true.

]]>Opps – Daniel please see below https://statmodeling.stat.columbia.edu/2019/05/15/we-see-mrp-as-a-way-to-combine-all-the-data-pre-election-voter-file-data-early-voting-precinct-results-county-results-polling-into-a-single-framework/#comment-1038710

]]>> In a Bayesian model, it’s “wrong” if it doesn’t represent the state of information we actually thought it should be representing.

Yes, wrong in the sense that if you persisted in using that represented state of information (would repeated do polls analysed using the same state of information) you would persist in being repeatedly wrong.

However, the one time it actually was done done in the past, it could have by (extreme) fluke given a reasonable answer. Remember if a tortoise is also predicting the event you are predicting, it is not impossible to lose to it.

I think we may persist in this disagreement for some time ;-)

]]>Yep, as I said then: https://statmodeling.stat.columbia.edu/2016/11/08/election-forecasting-updating-error-ignored-correlations-data-thus-producing-illusory-precision-inferences/#comment-343161

My impression is that modelers don’t put a “the whole process could have a bias that’s normally distributed at +- 5% or so” not because they don’t believe that’s true, but because if you do that you wind up with not much better than asking your aunt Greta who she thinks will win.

and the point is, if you look like a coin-flip, people won’t pay you because they can flip coins themselves. They want some kind of “certainty” and also some kind of “drama” (horserace).

If you ignore what the pollers tell you are the standard errors of their polls, and you say to yourself “it’s plausible that *all* the polls are biased one way or another, and each is a random realization of this biased process” then you’d start with a prior like maybe beta(5,5) for the true underlying fraction of people voting for say Clinton, and something like a normal(0,.05) truncated to [-1,1] model for the bias, and then something like normal noise in the biased polls with +- 5% margins.

outcome = normal(underlying+bias,noise)

you’d run your Bayesian model, and discover that there’s basically no information you can extract about the bias separate from the underlying (ie. it’s not identifiable), so at the end the bias is still normal(0,0.05) and the polls are polling around Clinton +2%, but with a 5% bias either way, it’s all very consistent with “you don’t know jack”.

Under that kind of model, you’d probably have gotten something like 55/45 Clinton/Trump instead of something like 90/10 or 80/20 like people were predicting.

]]>Right, but insert the cliche about model’s being wrong here. But we can be more nuanced than that, model’s can be more wrong in some aspects than in others. Thus the idea that posterior predictive checks should be focused on the summary statistics you most care about getting right. Or rather the idea of doing multiple posterior predictive checks to see what aspects of the data generating process the model does a good job of describing and what aspects the model does a poor job of describing. I get your points about non-response bias, etc. You’re saying that we had more uncertainty in the true state of public opinion, so it seems to me you’re making more of an argument that the election simulations (which is ultimately where I think Nate comes up with his topline number) should’ve been more dispersed. Right? But I still don’t see the median of all those election simulations being at 50%.

So to speak specifically about a detailed election forecasting model: our model might be wrong in the vote share or turnout of a particular demographic group, but it could still be good with getting the overall vote share for a particular candidate. That’s why I said, “narrowly focusing on the 32% probability.” To me, 25 – 35% feels about right. Donald Trump winning was as much of a shocker as a Clinton landslide (something like taking PA, MI, WI, AZ, FL, NC, OH, GA plus maybe even SC and TX ) would’be been.

“With appropriate assumptions you might say something like 90 +- 15% are white (obviously with a long left tail).” So 105% could be white? ;)

This is an aside, but in my area of expertise (fisheries) we have people publishing survival estimates for fish passage at dams where the point estimate (not just some portion of the uncertainty interval) is greater than 1. So apparently dams create fish. Those models are obviously wrong, and yet they keep pumping them out.

]]>Daniel:

Some of this discussion came up on the blog, for example here and here.

]]>In a Bayesian model, it’s “wrong” if it doesn’t represent the state of information we actually thought it should be representing.

In my opinion, we had information that said that polls were correlated, had serious nonresponse issues, that phone polling in general was problematic, and that some panel surveys had strange participants (unrepresentative).

When the outcome is essentially binary as it is in the US election system, then a state of information like 70/30 should be thought of as representing a fair amount of certainty that the 70 result will happen. And other pollers had more like 90/10. It was clear to me that 90/10 was overconfidence.

Basically the models were attributing more information to the results of polls than the polls actually had. Imagine you go out door to door and ask people their race. Suppose only 10% of people answer. Then you do a calculation using an assumption of simple random sampling and come up with something like 94 +- 2% of people are white. If it turns out that black or asian or etc families had reasons not to answer the door, then your model is over-confident because it makes poor assumptions. With appropriate assumptions you might say something like 90 +- 15% are white (obviously with a long left tail). The second model is “right” not because it gets the proportion of white people correct, but because it isn’t over-confident on what the value is.

However, with binary data, proportion and confidence are intimately tied because if p is the probability of the first thing to happen then 1-p is the probability of the second. 70/30 or especially 80/20 or 90/10 represented overconfidence in the information extractable from polls imho.

]]>I think there’s more of an argument here for this, given that losing the popular vote and winning the EC has happened before. So perhaps he has too much correlation among the states in his model. But still, events with 10% chance happen all the time.

Obviously, we can’t think about this from a frequentist point of view because, eww gross, but also because the 2016 election happens only once. But I’m still trying to wrap my head around a pre-election estimate of 32% probability or even 28.5% probability as “wrong”. On November 6th, 2016 Donald Trump winning seemed unlikely to just about everybody (including Trumps campaign!) so clearly the probability should have been less than 50%. Hell maybe he even won because it seemed unlikely. Maybe a solid chunk of the electorate voted for him simply because they believed the polls and did it as sort of a protest. Maybe if they thought he actually would win they would have changed theirs votes.

Is there any research into feedback in polling and elections? Does the information gained and published through polling actually alter the thing being polled?

]]>Chris:

I think Nate’s objections to MRP are:

1. MRP is not magic, and it’s being sold (not by me, but by some people) as being magic. In particular, inference for demographic slices or geographic areas with small sample sizes will be inherently model-based, so it will work well when the model has good predictors but not otherwise.

2. MRP estimates are not raw data thus they shouldn’t be trusted, they’re in some sort of “uncanny valley” that makes Nate uncomfortable.

Point 1 is fair enough. MRP uses predictive modeling, and predictive modeling can give bad answers where the model is off.

But I think point 2 is misinformed. Nate uses reported toplines from polls; those toplines are based on survey adjustment. This adjustment could actually be MRP (as with Yougov) or it might be some crappy weighting method that could improved if the survey orgs were using MRP. To oppose MRP (or, more generally, RPP) in such settings just seems foolish.

]]>Thanks for linkage Andrew! I don’t understand his negativity about MRP. At this stage, it seems decidedly preferable to just aggregating toplines…

]]>Chris:

MRP is something you do with raw data to adjust sample to population. It’s my impression that Nate does postprocessing of reported toplines, so he’s not analyzing survey responses directly. In 2016, some of the state polls did insufficient adjustment for nonresponse, hence analyses of reported summaries from state polls led just about all analysts to conclude that Clinton would probably win. Apparently, Trump’s campaign team thought Clinton was going to win, too. Nate was better than most of the others in that he had more uncertainty in the outcome.

It was possible to do better in 2016 using MRP on raw poll responses; see here.

]]>Most generally, as you add uncertainty to a binary forecast, your forecast should approach 50%. Non-response bias and turnout, from recollection, are huge sources of uncertainty, that need to be ‘adjusted’ for. What’s funny is I always assumed Nate was using MRP – but then, I am far from expert in this area.

]]>I found my screenshot from 538 on election day morning. “Who will win the presidency?” Hillary: 71.4%, Trump 28.6%

]]>I have used similar expressions when talking about indirect estimates for non-inferiority analysis and more generally network meta-analysis.

The general impression seems to be that although the effect assessment is not randomized it is based on data. However, whether the assessment is done by formulas, likelihoods or classic Bayes, there is a supposition that certain relative effects that can’t be directly estimated/assessed would be the same if an omitted comparison group had been in the study.

Now it can be very sensible to make that supposition, but the implied likelihood (or approximate likelihood based formula) should be recognized as an informed prior and an appropriate Bayesian workflow followed to assess if its informativeness is appropriate and credible.

Thew bigger picture here though, is that cost/benefit of this argument is not very attractive – its hard for even many statisticians to grasp and the extra uncertainty it brings out often is not that critical.

]]>Interestingly, Andrew’s blog had a good discussion on this same Atlantic piece about a month back: https://statmodeling.stat.columbia.edu/2019/04/07/research-topic-on-the-geography-of-partisan-prejudice-more-generally-county-level-estimates-using-mrp/#comments

]]>His model also had only a 10% chance Clinton wins the popular vote and loses the EC. It just wasn’t calibrated well, despite all the defensiveness on his part. It feels like a reckoning is still to be had, and it is good that we’re talking about MPR, but he’s been completely unapologetic and writing defensive screens against the media and anyone in his path ever since. Has been a sad thing to see.

]]>values closer to 50% would have better reflected the fact that in the presence of serious non response and other biases the information available made it near impossible to call the election. this was the reality, but models weren’t correctly modeling this effect

]]>“Nate was wrong, of course”?

Narrowly focusing on the 32% probability of winning prediction, how was he “wrong”? This was an estimate prior to the unobserved event. As you pointed out, outcomes with a 32% or less probability happen all the time (like Kawhi Leonard’s vicious dagger to put away the Sixers, which had 32.1% chance of being made if taken by the average player based on its distance and how quickly a defender is closing in on the shot).

What would “right” look like on November 7, 2016? If you mean that the expected value of prediction is exactly equal to the outcome then only 100% or 0% forecast can be considered right.

]]>Second, and speculatively, my read of one of the major conceptual frames at 538 is that they have faith in pollsters not techniques. They’ve calculated “grades” and estimated bias for hundreds of different pollsters, and their basic aggregation method is to take toplines and then weight them in terms of those factors. Surely, some of this is born of necessity as you can’t usually get the raw data, but I think there’s actually a philosophical commitment here. 1) There’s a lot more to polling techniques than just the ultimate statistical analysis of the responses and 2) there are a lot of judgement calls at every step of the process. So, it may make sense to focus on the work of those who have performed well in the past without worrying too much about how they managed to do it (at least some element of which likely involves trade secrets). In that sense, if I can speculatively put words in Nate’s mouth, a good pollster has more value than a good statistical method.

]]>From memory (I have the screen shot somewhere) Nate gives Trump a 32% chance of winning. Most pollsters gave trump about a 16% chance. Some dolts at the Huffington Post gave Trump a 1.8% chance, and 2 days before the election ran an article entitled “What’s Wrong With 538?”.

(At about the same time, the Cubs were down 3 games to 1 in the World Series, giving them about a 12.5% chance of winning a series they ultimately won. Stuff happens.)

Nate was wrong, of course, but “1 chance in 3” is more accurate than “1 chance in 6” or “1 chance in 50”.

Maybe he made less use of MRP type techniques than others, and concluded from this that he has a better mousetrap than MRP?

]]>But MRP can be used be used on more than just survey responses. It can be used whenever you have a multi-level regression and need to do some post-stratification. I have used it in finance, whereby you fit some multi-level model to asset returns at a particular point in time and then get group level estimates weighted by market capitalization instead of equally weighted. Same thing, no?

]]>More context: “Ugh, this is a bad use of MRP. (This tweet will have a very narrow audience.) The article makes it seem like the authors discovered *geographic* variation in this variable called political tolerance. But they don’t do that *at all*. They found *demographic* variation instead.” https://twitter.com/natesilver538/status/1103402077861146632?lang=en

]]>Fair point–once a method is “in the wild”, it is often treated as a kind of “default” that goes uncriticized (and, as you say, is treated as some kind of panacea).

I’d say in many scientific circles that Bayes factors have recently achieved this status, for example.

On the other hand, Silver’s words are also often treated uncritically by the non-statistical audience, so the onus is on him to explain his reasoning.

]]>If I were to play devil’s (Nate’s?) advocate, I might argue that, tone aside, Nate is not referring to how MRP is seen/used by actual social scientists for legit research. Perhaps Nate, as the editor of a data journalism site aimed at a lay audience, is mainly aware of and critiquing the use of MRP by authors of data journalism articles aimed at a lay audience. I guess I’d have to read the article (https://www.theatlantic.com/politics/archive/2019/03/us-counties-vary-their-degree-partisan-prejudice/583072/) to really know the context, but I’m not that good of an advocate. In any event, would you say that this very narrow, very generous interpretation holds up?

]]>> tunes the regression and poststratification formulas to “get the right answer”

Ah, thanks that helps me understand the target of his criticism. Though of course it could reasonably be leveled against literally any data analysis (after all, even computing a mean entails assumptions about measurement scales, how the data are partitioned, etc.). So, as you say, silly trash talk.

Personally, I appreciate that thinking in MRP terms forces me to confront and justify all that “tuning” and make it transparent. Funny how MRP is criticized for helping to lay bare the assumptions that are typically implicit.

]]>I wonder which is the Kurt Rambis of statistical methods?

]]>Wow! Thanks!

]]>Yup:

]]>