In research, when should you tie yourself to the mast?

In statistics, there are two things we know not to do:

1. Keep screwing around with your data and analysis until you get the answer that you want. This is called p-hacking or researcher degrees of freedom or forking paths, and it’s a known strategy for getting government grants, papers in PNAS, keynote talks at psychology conferences, Ted talks, NPR appearances, and . . . unreplicable results.

2. Use a flawed model and ride it all the way down to the inferno, never letting go of the reins even when the problems are obvious. That way lies the madness that is regression discontinuity analysis, so-called unbiased estimation, and, to be fair, Bayesian inference with really bad priors.

These came up in recent discussions of forecasting the recent congressional elections:

1. News organizations have been criticized for searching for data of any sort that would support their expectations of a Republican wave. In this election, tying yourself to the mast of the polls would’ve worked well, and “researcher degrees of freedom” allowed lots of the news media to simply reify their “vibes.”

2. From the other direction, some news organizations tied their hands too much by including problematic polls, most notably there was fivethirtyeight.com which was reluctant to let go of the notoriously undocumented Trafalgar polls. Not allowing yourself to change your analysis in midstream prevents some forms of p-hacking or the equivalent, but at the cost of allowing clear problems to just sit there in your analysis.

No easy answers

There’s no easy answer here, and it’s something that Elliott, Merlin, and I had to deal with, back in 2020 when we found problems in our forecasting model—in the middle of the campaign, after our model’s first predictions had already been released. We bit the bullet and made some changes, deciding that, in this case, the problems with data-based model alteration were less than the problem of letting major known problems fester. Later on we found problems and more problems with our model but did not change it, not so much out of concern for forking paths as because fixing is itself not automatic and could introduce new issues.

We found a completely different set of problems for the fivethirtyeight.com forecast that year, and, as far as I know, they didn’t change their model either, a decision to stand pat which made sense for them for the same reason it made sense for us. Changing a model because it makes some bad predictions is a bit like changing a recipe if the dish doesn’t taste quite right: it can take a lot of trial and error, and if you’re not careful, the new version can be worse, so this sort of adjustment is not something you want to be doing in real time.

Statistician culture, journalism culture, and economist culture

It’s my impression that people in different fields weigh these concerns differently. In statistics we are typically concerned about fitting the data, and we’ll try out all sorts of diagnostics and model adjustments. Journalists tend to be even more flexible—for them, it’s all about the vibes!

Economists fall at the other extreme: they’re very aware of the problem of “specification searches” and they also tend to overrate naive theory (“unbiasedness“); this combination leads them to avoid touching their data even in cases when there are obvious problems (as in those curvy regression discontinuity fits that keep turning up).

We discussed another one of these examples a few years ago, comparing old-school baseball analysts Bill James and Pete Palmer. Palmer set up his formula and ran with it, whereas James followed a better (in my opinion) approach of fitting his models, taking them seriously, and then carefully considering the cases where the inferences from his models didn’t seem to make sense. Sometimes in those settings he’d stick with the model and point to reasons why natural intuitions were unfounded; other times he’d change his modeling approach.

Another area where this comes up is meta-analysis, where it’s just standard practice to include all sorts of irrelevant crap. When making chili, if you include enough different flavors, you can end up with something delicious. I don’t think this works with scientific research summaries. Two notorious examples are the ivermectin meta-analysis and the “nudge” meta-analysis that included 11 papers by the disgraced food-behavior researcher Brian Wansink. “Include everything you can find” might seem like a recipe for rigorous, unbiased science, but it doesn’t really work that way if the individual ingredients are spoiled.

There’s no clear rule for when to accept the inferences and when to question them, but it’s part of the scientific process—just ask Lakatos.

P.S. In a recent post, Nate defends his inclusion of the notorious Trafalgar poll by saying:

It’s not quite a matter of me “taking things into consideration”. The pollster ratings are determined by a formula, not my subjective view of how much I like a pollster. But since Trafalgar had an awful 2022, they’re going to do much worse once the ratings are recalculated.

That’s fine, but, really, his decision to include Trafalgar in the first place is “subjective,” as is his choice of formula, as is his decision to use a formula here in the first place. It’s turtles all the way down, dude.

To put it another way: Yes, you can choose to tie yourself to the mast, but, if so, you’ve chosen to tie yourself to the mast, and other choices are possible—as becomes clear when you consider all the masts you’ve untied yourself from as necessary.

5 thoughts on “In research, when should you tie yourself to the mast?

  1. This reminds me of the earliest days of economywide macroeconomic modelling. Contrary to your characterization of economists now (which is accurate for the most part) back then the gold standard in macro modeling was Otto Eckstein’s DRI model. But the way it worked was that Eckstein estimated an econometric model and then just tweaked all the parameters when there were parts of a forecast he didn’t like. He generated what was generally acknowledged to be the most accurate macro model, but the model was explicitly “two parts data, one part Eckstein.” He sold DRI in 1979 for $100 million, though why McGraw-Hill (who bought it) wanted the model without Eckstein is a mystery. The problem with not tying yourself to the mast is that what you’ve done is at best explainable, but not reproducible.

    By contrast, Ray Fair tied himself to the mast, at least relatively. His Fair Model is still around today (as is he) and while he cheerfully admits his model is less accurate than a lot of macro models, he maintains (or he used to when we discussed it in the 70s) that it was the most accurate model that didn’t make ad hoc adjustments to parameters based on its predictions.

  2. Andrew’s blog of today ends with the mysterious admonition,

    “There’s no clear rule for when to accept the inferences and when to question them, but it’s part of the scientific process—just ask Lakatos.”

    Back on 2/11/2021 Andrew also mentioned Lakatos and this led me to

    https://www.dropbox.com/s/2sivwbs683gc3nz/ROH-Lakatos.pdf?dl=0

    “In all likelihood, hardly anyone would consider the possibility that he was a political prisoner simply because he was more Stalinist than many of the leaders of the Hungarian Communist Party. But there is evidence that this was exactly the case. It seems, furthermore, that Lakatos actually remained a Stalinist even after leaving prison:”
    “Lakatos was such a dominating influence on the minds of other group members that they were ready to obey all his instructions unconditionally. In some respects the atmosphere resembled that of a sect like Jim Jones’s infamous People’s Temple community in Jonestown.”
    “All in all, a lot of the evidence points to the possibility that Lakatos was a psychopath, which is indeed how he was described by Dr. Klára Majerszky, who worked at the National Psychiatric and Neurological Institute in Budapest and who knew Lakatos person-ally before the war.”

    Much gossip reverberates about Lakatos and his behavior both within Hungary and while he was at LSE. In fact, back in 1965, I could have asked Lakatos because I too was working at the London School of Economics but I did not become aware of his interest, fame and history until well after his death in 1974.

  3. “No easy answers”
    Indeed — but this is what distinguishes good science from bad.
    Pre-registration is often a good idea, but in the real world not a
    universal panacea. Data is collected and shows some feature
    that was unexpected. Maybe one now needs to do something
    that was not pre-registered. Yes, that opens the door to p-hacking
    but that is what makes the difference between a good scientist and
    a bad or dishonest one. I work on genetic history and ancient DNA.
    When a new set of data becomes available I often have no idea what I will
    find. Pre-registration is not feasible.

  4. Speaking about the media, since they make huge money from political ads, they have an incentive to sell a race. The effect can be subtle within their decision-making, as in the idea they need to be fair can infiltrate their processes.

  5. Nice post. Lately I’ve been structuring my research talks as “here are some ideas I took very seriously at some point and here’s where they crumbled in various ways.” I like giving talks this way. I see the high level mentality behind model checking, where its good to take things seriously but only if you’re willing to search for and acknowledge examples that suggest not taking them so seriously, as a nice philosophy for doing research. It makes critiquing your own work seem more natural.

Leave a Reply

Your email address will not be published. Required fields are marked *