Should we judge pundits based on their demonstrated willingness to learn from their mistakes?

Palko writes:

Track records matter. Like it or not, unless you’re actually working with the numbers, you have to rely to some degree on the credibility of the analysts you’re reading. Three of the best ways to build credibility are:

1. Be right a lot.

2. When you’re wrong, admit it and seriously examine where you went off track, then…

3. Correct those mistakes.

I [Palko] hve been hard on Nate Silver in the past, but after a bad start in 2016, he did a lot to earn our trust. By the time we got to the general election, I’d pretty much give him straight A’s. By comparison, there were plenty of analysts who got straight F’s, and a lot of them are playing a prominent role in the discussion this time around. . . .

This seems like good advice from Palko. The difficulty is in applying it. With the exception of the admirable Nate Silver (no, Nate’s not perfect; nobody is; like Bill James he sometimes has fallen into the trap of falling in love with his own voice and making overly-strong pronouncements; sometimes he just mouths off about things he knows nothing about; but, hey, so do we all: overall Nate is sane and self-correcting, even if recently he’s been less than open about recognizing where his work has problems, perhaps not realizing that everyone’s work has problems and there’s no shame in admitting and correction them), I don’t know that there are any pundits who regularly assess their past errors.

Palko picks on NYT political analyst Nate Cohn, but really he could name just about anyone who writes regularly on politics who’s ever published an error. It’s not like Gregg Easterbrook is any better. So I’m not quite sure what can possibly be done here.

18 thoughts on “Should we judge pundits based on their demonstrated willingness to learn from their mistakes?

  1. One thing these pundits could do is to include a certainty level with every prediction or forecast. Scott Alexander, of Slate Star Codex and its successor blog (Astral Codex Ten), liked to do that. He’d make a lot of predictions with certainty levels, and a year later rate them as having been right or not.

    He tended to find that if he gave a rating of, say, 70% that on the average about 70% of those would work out. He hoped to be able to find his own biases this way so he could try to compensate for them.

  2. “I [Palko] hve been hard on Nate Silver in the past, but after a bad start in 2016, he did a lot to earn our trust.”

    In the spirit of self-correction, can someone explain to me how Nate had a bad start in 2016? Does this have something to do with the 2016 election? The one where Nate Silver was one of the few who gave Trump a reasonable chance of winning? If I remember correctly, tabloids like the NYT gave Trump a ~1% chance of winning whereas Nate came out near the top performance-wise. What I remember most about the whole debacle was going from not knowing who Nate was prior to 2016 to having a great amount of respect for his election forecasting abilities during and after the 2016 election. What made 2016 a bad start for him?

    • Follow the link.

      This was written in November 2016. I was saying at the time that he had a bad start IN 2016, but that he had finished strong.

      Silver and company started the primaries with basically the same flawed analysis as NYT and the rest but when it became obvious that Trump was going to get the nomination, he took a hard look at his models and assumptions and made the changes you are referring to.

      Also note that Silver took huge heat at the time for breaking with the pack and sticking with that correct analysis.

      • Oops.

        The post linked to was from 2019 and it contained a repost from 2015.

        Just to be clear though, the comment about Silver getting ‘F’s applies ONLY to the primaries while the ‘A’s start with the general

        • Mark, thank you for taking the time to reply. I completely mis-read the post and I’m a little ashamed at myself for taking the time to write a comment without taking the time to actually digest the points you and Andrew were making. I apologize for writing such a stupid comment and wasting your time as well as Andrew’s.

        • I get where you were coming from. Silver doesn’t get nearly enough credit for breaking with the pack after the primaries and sticking to his guns despite the criticism. I frequently hear “Silver and the other data journalists got the 2016 election wrong” when he was the only one of the bunch to get it right in November.

    • Jordan:

      You write, “If I remember correctly, tabloids like the NYT gave Trump a ~1% chance of winning.” You do not remember correctly. The New York Times did not ever give Trump a ~1% chance of winning in 2016. Nate Silver notoriously gave Trump a 2% chance of winning the Republican nomination, and this was at a time when Trump was the top-polling Republican candidate—but, as Mark notes, Nate recognized his error and came to terms with it.

      • Hi Andrew,

        My comment was stupid for multiple reasons and, as you point out, one of those reasons was my failure to go back and actually (dis)confirm my memory of the NYT prediction. I apologize for wasting your time and shitting in your comments section.

  3. The problem with scientists refusing to acknowledge and fix mistakes in their papers, and the problem with data journalists refusing to acknowledge and fix mistakes in their predictions, are not different, though the players and stakes may be. Likewise, journals that refuse to retract bad papers or to change their review practices are fundamentally the same as news outlets that continue to back bad pundits. These are just different extremes of a single spectrum of public claims. We are all making public claims about the state of reality, past or present or future, that rely to some degree on trust.

    On one end of the spectrum, there are claims that are highly reliant on empirical evidence and rational argument–which, in principle, can be evaluated and verified externally–but which also require a degree of trust that the speaker adheres to certain (scientific or journalistic) standards. On the other end, there are claims that rely mainly on things that can’t be checked–proprietary algorithms, confidential sources, the benefit of years of experience–plus maybe enough logic to hold together a folksy metaphor or historical analogy. This blog tends to be concerned primarily with problems on the empirical end, as when scientists and journals refuse to be transparent or self-correcting. The data journalists and modelers discussed here are closer to the subjective end, and journalists who cover science news closer still. Topics don’t usually reach the pure punditry end of the spectrum, except to discuss pundits who (mis)use scientific/empirical claims in arguments supporting their opinions, like David Brooks.

    So, if the problems are the same, maybe the solutions are the same. This blog does a great job of serving as a hub of intelligent criticism and public accountability, one that can selectively draw from the good resources on, say, Twitter, without dragging in the bad elements on the bottom of its shoe. Probably (or hopefully) there are other sites that serve the same function that I’m not aware of. Retraction Watch is an entire organization that provides more comprehensive accountability–like a nonpartisan Media Matters for journals. Maybe these are models that can be extended on the empirical side and replicated on the subjective side. The key seems to be getting a critical mass of intelligent, informed people interested in accountability and willing to give some of their time to contribute to it. And, of course, leadership and greater dedication from a few.

    As for original ideas, I think it would be great if there were an organization that officially recognizes adherence to standards, transparency, and self-correction, perhaps using the awards model. Better to have an award for being responsible and accountable, I think, than for making breakthroughs. If we had a Pulitzer for statistics or social sciences, that’s what I’d want it to be. The factchecking trend in journalism is another nice model, but most pundits work for news(ish) organizations, so it’s unlikely David Brooks will ever get the same treatment as politicians when it comes to being called out for “misstatements.” Maybe a site that scores pundits’ predictions like Nate Silver scores pollsters? (And where Nate evaluates pollsters’ bias toward a party, this site would evaluate pundits’ willingness to recognize and correct predictions.)

  4. When it comes to successful forecasting, it is hard to beat the famous German sociologist, Max Weber. In a speech entitled, “Science as a Vocation,” at Munich University, just as Germany was losing World War I and the Spanish flu pandemic of the 20th century was raging, he contrasted American football coaches with the rest of the faculty:
    “that of a hundred professors at least ninety-nine do not and must not claim to be football masters in the vital problems of life.”
    Weber in this speech, also made this comparison between papal selection and the American presidency:
    “A counterpart are the events at the papal elections, which can be traced over many centuries and which are the most important controllable examples of a selection of the same nature as the academic selection. The cardinal who is said to be the ‘favorite’ only rarely has a chance to win out. The rule is rather that the Number Two cardinal or the Number Three wins out. The same holds for the President of the United States. Only exceptionally does the first-rate and most prominent man get the nomination of the convention. Mostly the Number Two and often the Number Three men are nominated and later run for election.”
    He predicted that as far as Germany is concerned,
    “Hence academic life is a mad hazard. If the young scholar asks for my advice with regard to habilitation [i.e.,obtaining tenure], the responsibility of encouraging him can hardly be borne. If he is a Jew, of course one says lasciate ogni speranza.”
    Note again that all of the above and more were said in 1917 or 1918–the date is in dispute.

  5. I’ll grant you that Cohn is no worse than average here (possibly even better), but that’s a low bar for a publication of the influence and reputation of the NYT.

    I’d also push back on the “anyone who writes regularly on politics” standard. Josh Marshall has the best track record in the field, but when he makes a mistake he has no trouble admitting it.

  6. Well here down under we still have pundits punditting even though we had a massive poll failure.
    no polling company has offered an explanation nor an answer to that failure but the pundits just keep on talking about the polls like nothing happened!

  7. Track records matter? One would hope. At least, I think that statement needs qualification. Jeanne Dixon made a good living, I think.

    There is an apocryphal story I heard in my youth about a doctor who had a reputation for predicting the gender of babies. (This was before ultrasound and so on.) After he retired a reporter asked him his secret. He replied, “I always tell the couple what they want to hear. If I’m right, they are delighted. If I’m wrong, they forget.”

    Pundits tell their followers what they want to hear. If track records matter, I would love to see the evidence. That’s not rhetorical. I really would.

    • There was a more elaborate version of that in an IEEE transactions on Stochastic Processes or some related sub-discipline, written by, attested by, the editor, an Indian of some eminence — I have scoured my references and never been able to find this thing; but the story went like this: in a certain village somewhere in India there was an elder whose reputation was known far and wide; for his especial and much-attested ability to predict the sex of the unborn child. Expectant couples would come from great distances to see him. This was his trick — said the editor: When the couple saw him he would make a prediction, boy or girl, at random; and with great flourish make an entry … which was precisely the *opposite* of what he had told them … in his books or pipe-roll or parchment or however it was handled. The pleased couple then went back to their village. In time, if the prediction came to pass, everyone learned of this, and the seer’s reputation went up another notch. If the prediction proved to false, then either the couple, too ashamed to admit they’d been fooled, told everyone otherwise … no the prediction was correct! Or else, if the couple were thick-skinned enough to take a whole day’s walk back to the village of the seer and complain, well then, he’d show them the entry in the book to prove they’d remembered wrong!

Leave a Reply to Min Cancel reply

Your email address will not be published. Required fields are marked *