Updating fast and slow

Paul Campos pointed me to this post from a couple of days ago in which he wrote:

I think it’s fair to say that right now the consensus among elite observers across the ideological spectrum . . . is that the presidential race is over because Donald Trump has no chance of winning — or rather his chances of winning are so slim that they can be treated as functionally zero for all practical purposes. . . .

Which raises this question: why do the instruments (in the form of various predictive mechanisms) not agree? For example, Nate Silver’s methodology still gives Trump a nearly one in five chance of winning. The betting markets, in which people put their money where there the mouths are, aren’t as bullish, but they still give Trump a one in ten chance. . . .

So what’s going on? Are the various predictive mechanisms flawed? Or does Trump still have a very real chance?

I replied:

Nate puts himself out there so it’s only fair that he gets criticized, but really I think the problem of updating priors given data is super-tough, and it’s easy to make mistakes.

On one hand, try to use your best judgment and you find yourself subject to all sorts of cognitive biases and perverse incentives (such as Nate not wanting to underestimate Trump twice, or Nate underreacting to polls in part in reaction to earlier criticism that he was overreacting to polls). On the other hand, if you “tie yourself to the mast” by using a preset algorithm, you can find yourself staring while your estimate goes off the cliff. It’s a delicate balance, and Nate’s in a tough position, putting out these forecasts every day.

I had this sort of thing in mind when writing that Warriors post yesterday. From the perspective of the judgment and decision making literature, the challenge is integrating new information at the appropriate rate: not so fast that your predictions jump up and down like a yo-yo (the fate of naive poll-watchers) and not so slow that you’re glued to your prior information (as happened with the prediction markets leading up to Brexit).

Given how hard it is to do this sort of information updating even in controlled settings such as sports, it’s no surprise that pundits, data journalists, and political scientists can have difficulty integrating new information with elections, where the N is lower and the rules of the game keep changing.

Is there any work in cognitive psychology on the rate at which people incorporate new information? I’ve read about the base-rate fallacy but I don’t recall seeing any studies of this dynamic process. It seems like an important topic.

43 thoughts on “Updating fast and slow

  1. This dissertation might be of use: Bullock, J. G. (2007). Experiments on partisanship and public opinion: Party cues, false beliefs, and bayesian updating (Order No. 3267471). Available from ProQuest Dissertations & Theses Global. (304810798).

    • Is there any work in cognitive psychology on the rate at which people incorporate new information? I’ve read about the base-rate fallacy but I don’t recall seeing any studies of this dynamic process.

      Nothing in my dissertation speaks exactly to Andrew’s question. But at least a few essays in Judgment Under Uncertainty seem relevant: Ward Edwards’ essay on “cognitive conservatism,” and the essay before it, too.

      That said, Judgment Under Uncertainty is now 34 years old, and some of the essays in it are older still. There must be some newer research that is relevant.

  2. There are certainly some people across the ‘elite spectrum’ who hold the belief that 20% is effectively 0%; partly because they believe that nobody fights ‘lost causes.’ We can safely say that certain Southerners dispute this fundamental proposition. The expert forecasting community is watching this one carefully.

    • Part of the problem is that there’s no way to validate this model in terms of percent likelihoods. If there’s a 20% (or 10% or 1%) likelihood of Trump winning given the polling data as of right now suggests that 20% of campaigns in such a state 10 days before the end would be won by the trailing candidate. But the number of elections we have with anything even close to the amount and type of data available for 2016 number in the single digits. An American presidential election is a totally different kind of election from, e.g., Congressional elections, where thousands of times more cases are available. So what that percentage means in practice is anything but clear, nor is it clear what a Clinton win would mean for a 2020 model in which she has a similar lead with 10 days to go.

      • The Good Judgment Project (https://www.gjopen.com/faq) uses a Brier Score to measure the accuracy of their human predictors.

        “Brier Score: The Brier score was originally proposed to quantify the accuracy of weather forecasts, but can be used to describe the accuracy of any probabilistic forecast. Roughly, the Brier score indicates how far away from the truth your forecast was.

        The Brier score is the squared error of a probabilistic forecast. To calculate it, we divide your forecast by 100 so that your probabilities range between 0 (0%) and 1 (100%). Then, we code reality as either 0 (if the event did not happen) or 1 (if the event did happen). For each answer option, we take the difference between your forecast and the correct answer, square the differences, and add them all together. For a yes/no question where you forecasted 70% and the event happened, your score would be (1 – 0.7)2 + (0 – 0.3)2 = 0.18. For a question with three possible outcomes (A, B, C) where you forecasted A = 60%, B = 10%, C = 30% and A occurred, your score would be (1 – 0.6)2 + (0 – 0.1)2 + (0 – 0.3)2 = 0.26. The best (lowest) possible Brier score is 0, and the worst (highest) possible Brier score is 2.

        To determine your accuracy over the lifetime of a question, we calculate a Brier score for every day on which you had an active forecast, then take the average of those daily Brier scores and report it on your profile page. On days before you make your first forecast on a question, you do not receive a Brier score. Once you make a forecast on a question, we carry that forecast forward each day until you update it by submitting a new forecast.”

  3. “Is there any work in cognitive psychology on the rate at which people incorporate new information?” –>

    There has been work on this topic within the framework of reinforcement learning (for quick description, see Montague et al, 2004, Nature – Box 1 on page 3 of article). “Learning Rates”, or updating rates, within these models control how quickly the current outcomes are integrated with past outcomes to form a new expected value.

    The cool thing about these parameters is that low/high values can be good/bad depending on the environment wherein prediction is taking place. For example, if the environment is changing quickly, then having a high learning rate would be beneficial because it is able to quickly update the expected value. If the environment is stable, however, then a low learning rate is better – the expected value of future outcomes will then be more robust to noise.

      • From the abstract in the link:

        “We show that human subjects assess volatility in an optimal manner and adjust decision-making accordingly. This optimal estimate of volatility is reflected in the fMRI signal in the anterior cingulate cortex (ACC) when each trial outcome is observed. ”

        I’m rather skeptical that fMRI signals could be so noise-free as to “reflect” something like “an optimal estimate of volatility”

        • Given that my layman’s understanding of the fMRI interpretation issue suggests that fMRI signals are somewhat similar to tarot readings I think I agree.

          I think I’d trust the animal studies. Not that I’d know, but it is very difficult to bribe a rat.

        • Yes this is a huge problem in “model-based fMRI” literature. In the methods that were typically used for the first 10 or so years people were doing this, researchers would fit a model to the behavioral data and then look for fMRI signals that correlated with parameters of interest, not taking into account the noise in the signal at all. Finally people started realizing the issues with doing this (http://www.princeton.edu/~rcw2/papers/WilsonEtAl_RLDM2013_fMRI.pdf) and more recently there have been attempts to jointly fit models in a way that takes fMRI noise as well as behavioral data noise into account (http://faculty.psy.ohio-state.edu/turner/lab/documents/overview_final.pdf).

          That being said, this doesn’t negate the fact that the model fit to the behavioral data in the Behrens paper does support the idea that human subjects “assess volatility in an optimal manner and adjust decision-making accordingly.” As with most cognitive neuroscience, the fMRI data is only of secondary interest (but does get you a fancy Nature Neuroscience paper!)

        • That still leaves the question of what the authors mean by “assess volatility in an optimal manner and adjust decision-making accordingly”. I’m not enthusiastic enough to read the paper to find out, but if someone thinks it’s worthwhile to explicate that here, I might be willing to read what they say and consider it on its merits.

        • Well, exactly as Nathaniel said above (which is why I posted this paper). They put people in fast changing and slow changing environments and fit a standard delta rule model to the data. In quickly changing environments, subjects had a higher learning rate. In slower changing environments, subjects had a lower learning rate.

        • Anonymous:

          I’m trying to connect what you said with what Nathaniel said above.

          Are you referring to his statement, “if the environment is changing quickly, then having a high learning rate would be beneficial because it is able to quickly update the expected value. If the environment is stable, however, then a low learning rate is better – the expected value of future outcomes will then be more robust to noise”? In other words, is this sentence defining what is meant by “optimal”?

        • Yes agreed that the abstract is awful (as it probably has to be for a Science paper). But all they are saying is that in a very simple perceptual decision-making task (respond with whether you heard more clicks on the right or left within a short interval) the noise in responding was due to perceptual noise (in hearing the clicks) rather than noise in the accumulator (a running tally of which side had a higher number of clicks). This suggests that on very short time scales with very simple forms of belief updates, the noise does not come from the updating process itself. A very partial answer to Andrew’s question (although not on any time/computational complexity scale that he would be interested in).

        • Thanks for the explanation. However, I’m not convinced that “This suggests that on very short time scales with very simple forms of belief updates, the noise does not come from the updating process itself.” It may simply be that there was so much noise in hearing the clicks that it overwhelmed any other kind of noise.

  4. Ned Augenblick and Matthew Rabin do exactly that in their paper “Testing for Excess Movement in Beliefs”. They first develop a statistical criterion that detects excess movement in belief streams (based on the idea that if you’re 95% sure today, e.g., then that puts an upper bound on how much your belief can move in the future, in expectation). They then apply it to test the predictions of both professional forecasters, and betting markets on sports games.

    • Sandro:

      I’m actually interested in this more in the context of classic uncertainty judgments rather than economic settings such as betting markets; that said, I look forward to seeing the data analysis that you’re referring to. (I couldn’t find the paper on Google.) Interesting that you mention excess movement but not updating that is too slow; perhaps that’s the subject of a different paper?

  5. No, they can’t identify updating that’s too slow, they can only identify excess movement. The reason is that their test is agnostic about the data generating process. (It is based on the fact that with two states of the world, in expectation, the sum of squared changes of a resolving belief stream must equal prior*(1-prior).). Hence, low variation in beliefs might simply result from very little information arriving.

  6. Lots of work on this dating back to Ward Edwards original studies on conservativism (maybe the original JDM paper?): http://psycnet.apa.org/index.cfm?fa=buy.optionToBuy&id=1966-11887-001
    There might not be a simple answer though, seems highly dependent on the context, i.e. payoffs or response mechanism in the Edwards paper or causal explanation in a more recent example (https://psych.nyu.edu/rehder/Hayes%20Hawkins%20Newell%20%20Pasqualino%20Rehder%20COGNIT-D-14-00326%20R.pdf).

  7. Since people process new information over some period and that they’re influenced by how their associative groups process that information – and in which they may be the leading or trailing edge of the processing – then if the period extends past an arbitrary cutoff like election day then it’s added uncertainty. An odd thing, for example, with the FBI letter is it appears they can’t actually process the information within the cutoff, so there’s a piece of information that may or may not have actual value which people – meaning real undecideds or those genuinely vacillating – can only assign a sort of prior weight to in which they think “it’s worth this much if the emails are x” and “it’s worth this much if the emails are y”. How do you model that when you’re interested in the relatively small in between group except post hoc when you ask “how much did this matters” and “would you have changed your vote is y was true instead of x”.

    I spent last night with a relative who watched a lot of anti-Hillary videos in the car. I’d say (anecdotally) from that, from looking Facebook and other posts, etc. that most of the reaction is from the committed anti-Hillary or the committed pro-Hillary and the committed anti-Trump and pro-Trump groups (because anti- and pro- aren’t identical). These groups process information at a faster rate, I would imagine, than people who aren’t fitting the new information to a specific set of conditions. One could argue they aren’t processing it at all, that they’re just immune to new information, which gets at the question asked about the rate at which one processes. I compare it to walking down the stairs under different circumstances: eyes open, nothing in your hands, looking down versus hands full or even eyes closed and you need to feel for the step or at least for the last step or two. In the former, you’re processing the information the stairs convey without effort, even without recognizable consciousness, but in the latter it takes much more effort and relatively much longer. Going down with open eyes, etc. is like being firmly committed to Hillary or Trump.

  8. Is there an equivalent to this question for science? That is, do we optimize the advance of science in a world in which studies are filtered more rigorously for statistical validity (no more himmicanes or shark attacks) and we update quickly on findings? Or are we better off with PNAS doing its thing but observers updating slowly? It seems like the answer would depend on the value of variance and incentives for search in different directions. Honestly, when it’s framed this way I wonder if science won’t be more fruitful in a world with himmicanes than in a world of greater statistical rigor.

    Though, hopefully obviously, in this latter case there’s almost certainly an optimal third way in which statistical speculation is allowed but is discussed in a careful fashion that recognizes common fallacies like researcher degrees of freedom.

    • Charlie:

      For the himmicanes, sure, little harm was done, same with the ESP paper as it’s not like anyone believes that anyway. For embodied cognition, statistical errors seem of have wasted thousands years of researcher time over the past decades. And for medical research, I have no idea but I’m guessing a lot of resources have been wasted, both on pursuing false leads and on doing studies that never had a chance of being more than noise.

      My point here is that statistical overconfidence of the himmicanes variety is not always just a harmless pursuit of off-the-wall ideas (and, I agree, off-the-wall ideas are worth pursuing); it also diverts resources away from more serious research.

      Put it another way: imagine that 50% of academic physics research effort was going into, umm, I dunno, cold fusion and N-rays. That would be a problem, no? That’s kinda what social psychology was like, or at least that’s what it looks like from the outside.

      I agree with your final point that speculation is just fine if all is open, if data are shared. Everything should be published, and publication should not be such a big deal. It’s the Arxiv model.

  9. Why can’t it be something like, “what’s the chance of some damning info. being unearthed about Hillary” or “the chance of Hillary saying something stupid”.

    Musn’t one give some weight to all such freak events? Given how fast info. can spread these days a 1 in 10 chance given to the union of all such potential upset-scenarios doesn’t sound too bad, does it?

  10. Tetlock discusses to some extent in his new book how fast super forecasters update their believes.

    I didn’t find it very useful, but you may think otherwise. To me it was just a bunch of cases with no theory. But you may take some insight from it. The cases are interesting.

  11. A guy like Silver who makes dozens of predictions on his site, can clearly differentiate between a 0% chance, a 20% chance, a 75% chance, etc. But a political talking head probably has just three prediction states, A will win (0%), B will win (100%), or it’s a toss up (50%). So it might not be about updating priors, just people using simpler prediction models.

  12. I’ll be at Columbia on Wednesday Nov 2nd to speak to the student statistics seminar (about election trading!) I’ll probably have something to say about this (mostly that it’s hard, but some things can be done).

  13. Nate’s not looking so retrograde now I think? His model accounted for possible changes in the polls *after the date of his prediction is published.* And now the polls are changing due to this latest FBI email letter (and probably other reasons). Do we live in a world where 4 out of 5 elections standing where they stood 2 weeks ago have no subsequent scandals that damage the front-runner? Sounds about right to me.. Kudos to Nate for his model seeming to work correctly *again*. Obviously Clinton is still the front-runner but it’s clearly not a slam-dunk in the tank win for her anymore.

    • Steve:

      I doubt the latest news has changed many voters’ opinions but I suspect it’s motivated more Trump supporters to respond to polls. As we’ve discussed elsewhere on this blog, voter turnout in the election is around 60% but response rates to surveys are under 10%. Responding to a poll is much more optional than voting, and thus it makes sense that all sorts of news events that won’t affect vote preferences or turnout much, can still cause swings in the polls via differential nonresponse.

      • Well said Andrew and thanks for replying. My main point is that Nate’s model takes into account (from what I understand by listening to what he and Harry Enten say about it) some future statistical uncertainty in the polls. They aren’t just interested in the error bars on the polls, they’re interested in the possibility that there can be movement in the polls due to future events (and of course assume that the poll swings will represent actual vote swings). As the election gets closer, their predictions get more locked in (i.e. they become more confident that the current polling average is what we’re going to see on election day).

        That seems like a partial answer to the fact that the public polls simply can’t reach so many voters (as you say), and that the polls are more erratic than true voter sentiment. Averaging out a bunch of polls (and weighting them based on methods/quality) also seems like a good way to reduce poll swing fever (something they also do I think).

        I feel like Nate and Harry are doing a better job than anyone else in trying to use the available data on the election and making predictions based on it.

        My main point is that while many predictions starting “locking up” the race for Hillary b/c of all the bad news for Trump, they were basically baking in some assumptions that because things were going badly for Trump they would continue to do so. And with 2 weeks left, things have suddenly started going very badly for Clinton. She’s still the favorite but I like that Nate’s model left a bit of daylight for Trump, and it seems like we’re seeing some of that daylight now..

  14. Thanks. One of those links lead to Nate’s 2011 write up of same (http://fivethirtyeight.blogs.nytimes.com/2011/08/31/despite-keys-obama-is-no-lock/?_r=0), and this gem:

    “Mr. Lichtman, for instance, scored Mr. Obama as charismatic in 2008… However, Mr. Lichtman does not score Mr. Obama that way now.” (now == 2011)

    Wow. Even some of the social psychology research you’ve written about doesn’t go this far. It’s one thing to have a somewhat subjective charisma factor. It’s another to re-cast the same dude with a different score in different elections! If that’s not fitting data to match your expectations, I don’t know what is. Thanks again for the insight!

  15. “I think the problem of updating priors given data is super-tough, and it’s easy to make mistakes”. Yup, it’s very easy to make mistakes and this time it seems that the mistake was made by those who gave “functionally zero” chances of winning to Trump – looks like Nate’s forecast wasn’t so off, apparently.

Leave a Reply to Steve Midgley Cancel reply

Your email address will not be published. Required fields are marked *