This sounds like conservatism on Nate’s part. He trusts traditional survey weighting because it’s been done before, it has stood the test of time. The trouble is that samples are getting worse and worse, hence there’s need to adjust for more and more predictors, and classical weighting starts to fall apart. I discuss some of this in my Struggles paper.

As is often the case in statistics, there is no safe harbor.

]]>More of Nate’s MRP comments are here:

https://youtu.be/Thf8weQSPrc?t=1668

Vavreck pushes him on his remarks at 33:00

“It feels like a lot of the MRP stuff hasn’t been backtested” seems like his main critique.

]]>basically a technique where you learn to predict the average results of various groups using regression on survey data, and then you figure out the average results of a full population by predicting using the regression equations for the *known* demographics of the whole population, rather than relying on the survey to accurately sample from every demographic group in the appropriate proportion.

]]>There’s some sports stuff here. And tons and tons of sports-related material here. Also lots of analyses I’ve never published.

]]>“It’s worse than that!” – hey now, I’d go for more sports topics on this blog:) It’s as least an important a topic as gremlins and the end of the world. ]]>

It’s worse than that! I also do a lot of sports analysis. If you want to talk about unimportant topics that statisticians are obsessed with, sports has got to top the list.

]]>MRP is something you do with raw data, to adjust the sample to match the population. An example is here. Another example is here. When Nate does poll analysis, I think he uses published summaries: rather than taking raw survey responses, he relies on whatever adjustments were done by the polling organizations. I understand this decision on Nate’s part: there are lots of polls out there that will publish summaries but not release their raw data—but the point is that he wouldn’t be using MRP for most of what he does. So maybe he thinks MRP is overrated because he doesn’t use it. The thing is, there’s a lot that can be learned from analysis of survey data. MRP is a really powerful tool.

]]>Nate Silver came up with a player projection system which he named after Carmelo:

https://fivethirtyeight.com/features/how-were-predicting-nba-player-career/

In that introduction he links to an article about Anthony he wrote:

https://fivethirtyeight.com/features/carmelo-anthonys-contract-could-doom-the-knicks-to-mediocrity/

Maybe he thinks Carmelo isn’t as bad as people think but still not worth his contract? But that article is old and Carmelo’s reputation has taken multiple hits since then.

Basically he went to the Thunder and couldn’t adjust his game to be the third option, then went to the Rockets and couldn’t adjust his game to be a role player (he still thought he was a star). So maybe being a Carmelo is thinking you should be the primary option when in reality you should be the fifth or sixth line of evidence?

]]>in my area of expertise (fisheries) we have people publishing survival estimates for fish passage at dams where the point estimate (not just some portion of the uncertainty interval) is greater than 1

Obviously not Bayesian models. This is exactly the kind of stuff we’ve had discussions about every time Frequentist vs Bayesian comes up. Bayesian models can’t give even intervals that overlap with impossible regions provided you give the appropriate definition of impossible to the model.

]]>No, I’m suggesting that specifically election models using data of the type being collected “these days”.

Things like “random digit dial, and most people have cell phones and ignore you” or “online polling” or whatever.

If you collect more reliable data, you could easily have predictions that *should* be something like 90/10, but it needs to be something you have strong reason to believe is truly reliable.

]]>I did that google and none of the instances of wtf were from me, except for this post right here.

]]>So of course google

“wtf andrew gelman”

and you can see for yourself how frequent that is. But perhaps more relevant is the unnecessary and ubiquitous “like.”

https://news.nationalgeographic.com/news/is-valley-girl-speak–like–on-the-rise-/

]]>Isn’t this something we could assess retrospectively? Obviously not if we only look at the 2016 election, but suppose we took all the cases where we had some facsimile of the kind of polling data that we had in 2016. We’d probably have to extend beyond presidential elections. But we could train models on the polling data using say the methods of 538 versus a model that introduces more forms of uncertainty, versus a coin flip. Since you’re suggesting any election model should be closer to a coin flip based (or maybe I’m misrepresenting what you’re suggesting and you’re narrowly focused on the condition present in the 2016 polls), and 538 is willing to be more precise than that. Given enough contests that have pre-election polling data that satisfy our criteria, we should be able to assess which models performed the best.

In fact didn’t 538 do something like this: https://projects.fivethirtyeight.com/checking-our-work/

]]>“polling methods that worked when phone calling was a very different activity will continue to be relevant and low bias today”

It could have by extreme fluke been the case that phone polling and other methods were perfectly fine and we just got some weird numbers throughout the whole of the lead-up to 2016, and if we persist in using those old polling methods we’ll do fine next time, but we knew, or should have known, that it wasn’t true.

]]>> In a Bayesian model, it’s “wrong” if it doesn’t represent the state of information we actually thought it should be representing.

Yes, wrong in the sense that if you persisted in using that represented state of information (would repeated do polls analysed using the same state of information) you would persist in being repeatedly wrong.

However, the one time it actually was done done in the past, it could have by (extreme) fluke given a reasonable answer. Remember if a tortoise is also predicting the event you are predicting, it is not impossible to lose to it.

I think we may persist in this disagreement for some time ;-)

]]>My impression is that modelers don’t put a “the whole process could have a bias that’s normally distributed at +- 5% or so” not because they don’t believe that’s true, but because if you do that you wind up with not much better than asking your aunt Greta who she thinks will win.

and the point is, if you look like a coin-flip, people won’t pay you because they can flip coins themselves. They want some kind of “certainty” and also some kind of “drama” (horserace).

If you ignore what the pollers tell you are the standard errors of their polls, and you say to yourself “it’s plausible that *all* the polls are biased one way or another, and each is a random realization of this biased process” then you’d start with a prior like maybe beta(5,5) for the true underlying fraction of people voting for say Clinton, and something like a normal(0,.05) truncated to [-1,1] model for the bias, and then something like normal noise in the biased polls with +- 5% margins.

outcome = normal(underlying+bias,noise)

you’d run your Bayesian model, and discover that there’s basically no information you can extract about the bias separate from the underlying (ie. it’s not identifiable), so at the end the bias is still normal(0,0.05) and the polls are polling around Clinton +2%, but with a 5% bias either way, it’s all very consistent with “you don’t know jack”.

Under that kind of model, you’d probably have gotten something like 55/45 Clinton/Trump instead of something like 90/10 or 80/20 like people were predicting.

]]>So to speak specifically about a detailed election forecasting model: our model might be wrong in the vote share or turnout of a particular demographic group, but it could still be good with getting the overall vote share for a particular candidate. That’s why I said, “narrowly focusing on the 32% probability.” To me, 25 – 35% feels about right. Donald Trump winning was as much of a shocker as a Clinton landslide (something like taking PA, MI, WI, AZ, FL, NC, OH, GA plus maybe even SC and TX ) would’be been.

“With appropriate assumptions you might say something like 90 +- 15% are white (obviously with a long left tail).” So 105% could be white? ;)

This is an aside, but in my area of expertise (fisheries) we have people publishing survival estimates for fish passage at dams where the point estimate (not just some portion of the uncertainty interval) is greater than 1. So apparently dams create fish. Those models are obviously wrong, and yet they keep pumping them out.

]]>Some of this discussion came up on the blog, for example here and here.

]]>In my opinion, we had information that said that polls were correlated, had serious nonresponse issues, that phone polling in general was problematic, and that some panel surveys had strange participants (unrepresentative).

When the outcome is essentially binary as it is in the US election system, then a state of information like 70/30 should be thought of as representing a fair amount of certainty that the 70 result will happen. And other pollers had more like 90/10. It was clear to me that 90/10 was overconfidence.

Basically the models were attributing more information to the results of polls than the polls actually had. Imagine you go out door to door and ask people their race. Suppose only 10% of people answer. Then you do a calculation using an assumption of simple random sampling and come up with something like 94 +- 2% of people are white. If it turns out that black or asian or etc families had reasons not to answer the door, then your model is over-confident because it makes poor assumptions. With appropriate assumptions you might say something like 90 +- 15% are white (obviously with a long left tail). The second model is “right” not because it gets the proportion of white people correct, but because it isn’t over-confident on what the value is.

However, with binary data, proportion and confidence are intimately tied because if p is the probability of the first thing to happen then 1-p is the probability of the second. 70/30 or especially 80/20 or 90/10 represented overconfidence in the information extractable from polls imho.

]]>Obviously, we can’t think about this from a frequentist point of view because, eww gross, but also because the 2016 election happens only once. But I’m still trying to wrap my head around a pre-election estimate of 32% probability or even 28.5% probability as “wrong”. On November 6th, 2016 Donald Trump winning seemed unlikely to just about everybody (including Trumps campaign!) so clearly the probability should have been less than 50%. Hell maybe he even won because it seemed unlikely. Maybe a solid chunk of the electorate voted for him simply because they believed the polls and did it as sort of a protest. Maybe if they thought he actually would win they would have changed theirs votes.

Is there any research into feedback in polling and elections? Does the information gained and published through polling actually alter the thing being polled?

]]>I think Nate’s objections to MRP are:

1. MRP is not magic, and it’s being sold (not by me, but by some people) as being magic. In particular, inference for demographic slices or geographic areas with small sample sizes will be inherently model-based, so it will work well when the model has good predictors but not otherwise.

2. MRP estimates are not raw data thus they shouldn’t be trusted, they’re in some sort of “uncanny valley” that makes Nate uncomfortable.

Point 1 is fair enough. MRP uses predictive modeling, and predictive modeling can give bad answers where the model is off.

But I think point 2 is misinformed. Nate uses reported toplines from polls; those toplines are based on survey adjustment. This adjustment could actually be MRP (as with Yougov) or it might be some crappy weighting method that could improved if the survey orgs were using MRP. To oppose MRP (or, more generally, RPP) in such settings just seems foolish.

]]>MRP is something you do with raw data to adjust sample to population. It’s my impression that Nate does postprocessing of reported toplines, so he’s not analyzing survey responses directly. In 2016, some of the state polls did insufficient adjustment for nonresponse, hence analyses of reported summaries from state polls led just about all analysts to conclude that Clinton would probably win. Apparently, Trump’s campaign team thought Clinton was going to win, too. Nate was better than most of the others in that he had more uncertainty in the outcome.

It was possible to do better in 2016 using MRP on raw poll responses; see here.

]]>I have used similar expressions when talking about indirect estimates for non-inferiority analysis and more generally network meta-analysis.

The general impression seems to be that although the effect assessment is not randomized it is based on data. However, whether the assessment is done by formulas, likelihoods or classic Bayes, there is a supposition that certain relative effects that can’t be directly estimated/assessed would be the same if an omitted comparison group had been in the study.

Now it can be very sensible to make that supposition, but the implied likelihood (or approximate likelihood based formula) should be recognized as an informed prior and an appropriate Bayesian workflow followed to assess if its informativeness is appropriate and credible.

Thew bigger picture here though, is that cost/benefit of this argument is not very attractive – its hard for even many statisticians to grasp and the extra uncertainty it brings out often is not that critical.

]]>Narrowly focusing on the 32% probability of winning prediction, how was he “wrong”? This was an estimate prior to the unobserved event. As you pointed out, outcomes with a 32% or less probability happen all the time (like Kawhi Leonard’s vicious dagger to put away the Sixers, which had 32.1% chance of being made if taken by the average player based on its distance and how quickly a defender is closing in on the shot).

What would “right” look like on November 7, 2016? If you mean that the expected value of prediction is exactly equal to the outcome then only 100% or 0% forecast can be considered right.

]]>Second, and speculatively, my read of one of the major conceptual frames at 538 is that they have faith in pollsters not techniques. They’ve calculated “grades” and estimated bias for hundreds of different pollsters, and their basic aggregation method is to take toplines and then weight them in terms of those factors. Surely, some of this is born of necessity as you can’t usually get the raw data, but I think there’s actually a philosophical commitment here. 1) There’s a lot more to polling techniques than just the ultimate statistical analysis of the responses and 2) there are a lot of judgement calls at every step of the process. So, it may make sense to focus on the work of those who have performed well in the past without worrying too much about how they managed to do it (at least some element of which likely involves trade secrets). In that sense, if I can speculatively put words in Nate’s mouth, a good pollster has more value than a good statistical method.

]]>From memory (I have the screen shot somewhere) Nate gives Trump a 32% chance of winning. Most pollsters gave trump about a 16% chance. Some dolts at the Huffington Post gave Trump a 1.8% chance, and 2 days before the election ran an article entitled “What’s Wrong With 538?”.

(At about the same time, the Cubs were down 3 games to 1 in the World Series, giving them about a 12.5% chance of winning a series they ultimately won. Stuff happens.)

Nate was wrong, of course, but “1 chance in 3” is more accurate than “1 chance in 6” or “1 chance in 50”.

Maybe he made less use of MRP type techniques than others, and concluded from this that he has a better mousetrap than MRP?

]]>But MRP can be used be used on more than just survey responses. It can be used whenever you have a multi-level regression and need to do some post-stratification. I have used it in finance, whereby you fit some multi-level model to asset returns at a particular point in time and then get group level estimates weighted by market capitalization instead of equally weighted. Same thing, no?

]]>I’d say in many scientific circles that Bayes factors have recently achieved this status, for example.

On the other hand, Silver’s words are also often treated uncritically by the non-statistical audience, so the onus is on him to explain his reasoning.

]]>If I were to play devil’s (Nate’s?) advocate, I might argue that, tone aside, Nate is not referring to how MRP is seen/used by actual social scientists for legit research. Perhaps Nate, as the editor of a data journalism site aimed at a lay audience, is mainly aware of and critiquing the use of MRP by authors of data journalism articles aimed at a lay audience. I guess I’d have to read the article (https://www.theatlantic.com/politics/archive/2019/03/us-counties-vary-their-degree-partisan-prejudice/583072/) to really know the context, but I’m not that good of an advocate. In any event, would you say that this very narrow, very generous interpretation holds up?

]]>Ah, thanks that helps me understand the target of his criticism. Though of course it could reasonably be leveled against literally any data analysis (after all, even computing a mean entails assumptions about measurement scales, how the data are partitioned, etc.). So, as you say, silly trash talk.

Personally, I appreciate that thinking in MRP terms forces me to confront and justify all that “tuning” and make it transparent. Funny how MRP is criticized for helping to lay bare the assumptions that are typically implicit.

]]>I wonder which is the Kurt Rambis of statistical methods?

]]>