What a multiverse good for anyway?

Posted on February 12, 2026 2:00 PM by Jessica Hullman

This is Jessica. Imagine that it’s trivially easy to hand an empirical paper to a generative AI agent, and have it construct a multiverse showing the results you’d see given slightly different decisions about how to analyze the data (This is already basically true). What’s the value of this? How much more (or less) would we learn from the average paper if this were the default?

Multiverse analysis, like some other attempts to overcome selective reporting of results, lives somewhere in between solution and “but wait, what exactly do I do with this?” Since the 2016 specification curve paper, the multiverse has inspired attempts to theorize what it should be–e.g., a set of results derived from making different analysis choices where the different decisions are “genuinely arbitrary”–and motivated the development of software packages to support specifying and visualizing results.

My own perspective as someone who has participated in research on it is probably best summarized by the questions “How realistic is it to expect multiverse to be used widely, given that most authors first and foremost want to convince readers they have a clear point?” and “How do we make multiverse useful for readers, who may be struggling to accurately interpret uncertainty in even a single analysis?” Along similar lines, I recall hearing Andrew once say something like, “When we proposed it we didn’t think people would actually start doing it.”

On the other hand, being up front about our uncertainty around the right way to analyze our data is clearly better than ignoring it. Multiverse analysis has helped us recognize how arbitrary results can be. It can be a valuable rhetorical tool.

In the spirit of such questions, in our paper “What’s a multiverse good for anyway?” Julia Rohrer, Andrew, and I write:

Multiverse analysis has become a fairly popular approach, as indicated by the present special issue on the matter. Here, we take one step back and ask why one would conduct a multiverse analysis in the first place. We discuss various ways in which a multiverse may be employed – as a tool for reflection and critique, as a persuasive tool, as a serious inferential tool – as well as potential problems that arise depending on the specific purpose. For example, it fails as a persuasive tool when researchers disagree about which variations should be included in the analysis, and it fails as a serious inferential tool when the included analyses do not target a coherent estimand. Then, we take yet another step back and ask what the multiverse discourse has been good for and whether any broader lessons can be drawn. Ultimately, we conclude that the multiverse does remain a valuable tool; however, we urge against taking it too seriously.

A multiverse as a tool for reflection and critique is closest to the spirit of Steegen et al.’s 2016 use of it to interrogate effects of fertility on political attitudes and religiosity: a postmortem gesture to the fact that the results could have been different. Multiverse analysis shines as a way to raise awareness of the prevalence and consequences of seemingly arbitrary analysis decisions.

But multiverse can also be a brute force tool for persuasion, as in Julia et al.’s use of it to show how robustly birth order fails to predict personality traits. There’s also been interest in using it as a serious inferential tool, accompanied by theories of what it means for a multiverse to be valid or rigorously constructed, or how we should think about sampling from it in cases where running all analyses is infeasible.

The hard question is: How do you decide what paths are justified to include? A paper by Del Giudice and Gangestad attempts to lay down ground rules, with the idea that a multiverse can mislead if it combines analyses that are not a priori indistinguishable. You want the differences between paths to truly be arbitrary, as if we have a flat prior over their plausibility. This motivates questions like, Are the measurements equally valid? Is the estimand the same? Are the causal assumptions the same? These may not be easy calls to make. For example, often we are uncertain about the true data-generating process. When we are comparing models with different covariate structure, throwing them all in a single multiverse is not necessarily going to give us something interpretable, because there is only one true process. However, what kinds of equivalence are needed will also depend on your goal in using a multiverse. If you’re only using it to critique underspecification in a given area, maybe you want to include paths with different estimands to help make your point. All this is to say that theorizing the multiverse is not straightforward: there’s a lot of nuance in how to think about a “valid” multiverse and interpret the variability in results. And common default interpretations (e.g., treating the relative frequency of outcomes as informative about what is likely to be correct) are not necessarily valid.

So as soon as you move out of reflection-and-critique-land, and start using the multiverse to make specific points rather than merely gesture at uncertainty, you open up questions about what paths belong in the multiverse. Sometimes there may be a clear set of reference studies for the effect you study, so you just include all of the variations that those studies tried. But often it’s not that easy, and domain knowledge becomes important to what different researchers conclude about the right way to set it up. There’s no real reason to expect it to be easier to get consensus on multiverse construction than it would be to get consensus on a single analysis, just like there’s no reason to think getting a group of people to agree on a twelve course tasting menu with wine pairings is easier than getting them to agree on plain cheese or pepperoni pizza. This challenges our ability to use of multiverse as a serious inferential tool, where we can look at the results and say, Yes, this establishes what we know about this research question.

None of this is meant to say that multiverse analysis isn’t often a valuable thing to do. It’s a powerful conceptual tool for reflecting on uncertainty, and our article should not be read as condemning it. It’s also natural (and valuable from a meta-scientific standpoint) for some exploration to occur over time as a method becomes popularized, and researchers want to see what happens if we take it seriously. We just caution against viewing multiverse analysis as a data-driven way to defer hard decisions. There’s a difference between using multiverse to acknowledge uncertainty versus to try to resolve it while avoiding the thorniness of theoretical commitments. The latter is not a win for science.

Multiverse analysis and generative AI

On that note, going back to the question at the start of this post, I suspect theories around the value of a multiverse will only become more relevant as generative AI is increasingly used to both produce and analyze research results. We don’t talk about this in the paper, but we’re already at the point where a paper can trivially become an interactive interface–give it to Claude Code and ask it to replicate the analyses. It’s a tiny step from there to generating a multiverse by first identifying possible decision points and then varying them.

If we can ablate analysis decisions in many different ways and surface this for reviewers, will this make peer review more informative? Will research become more generalizable? Or will we filter out some truly innovative research in favor of safe incremental results? The point is that lots of forms of robustness checking that used to be difficult are now trivially easy, so these questions about how we think about the right set of ablations and how we interpret the results become critical. The fact that variability in results from different ablations on the method is not straightforward to interpret is something I hope those working on AI for science will take seriously. Otherwise we end up with comprehensive tests of whether results align with questionable heuristics at scale.

P.S. An anecdote about this paper … I was in my usual coffee shop talking to Julia on zoom about this project, and a guy who was sitting near me overheard me saying “multiverse.” After the call he asked if he could pick my brain on how to learn more about this topic. He apologetically explained that he wasn’t an academic or scientist at all but was very interested in the research and was looking for any pointers he could get. So I was like, Sure, and I said some stuff about multiverse research and pointed him to some lecture series at Northwestern that might sometimes cover related topics, by which I meant data science, reproducibility, statistics, etc. But something about the befuddled look on his face made me pause. Anyway, later I realized that he probably thought I was talking about a very different kind of multiverse, and regretted his boldness when I launched into it about statistics!

4 thoughts on “What a multiverse good for anyway?”

Andrew on February 12, 2026 3:42 PM at 3:42 pm said:

Jessica:

Thanks for the post. Also, for people who want to read our paper without having to log into a website, they can find it here.

Reply ↓
AAAnonymous on February 12, 2026 4:26 PM at 4:26 pm said:

(I suck at all things statistics, so please forgive me if I am misunderstanding)

This post, and the preprint, reminded me of a blog post on here titled “Let’s analyze how we analyze” dated march 25, 2025. I think there was mention of many analysts projects, and your preprint mentions the Silberzahn et al. (2018) paper. I wondered about possibly including some sentences about such many analysts projects in your section 2 titled “The multiverse as a persuasive tool”.

I can see how a case of many labs, or different researchers, presenting many analyses might somehow be seen as extra persuasive. Or it might make it a bit harder for an individual scientist to be critical, because it might (consciously or unconsciously) be a bit harder for some people to go against a large group of researchers presenting several analyses. I wondered whether a mention of this, might be applicable and appropriate concerning your manuscript, especially given you section 2, and the fact you already refer to such a many analyst project (the Silberzahn et al., 2018).

Reply ↓
gec on February 13, 2026 8:10 AM at 8:10 am said:

I agree with your perspective that the “multiverse” provides a valuable perspective, but has limited practical utility.

For one thing, I think the value of the “multiverse” perspective is that it is one way to push against the tendency to state conclusions too broadly. The conclusions of any analysis applied to a particular dataset are always conditional on assumptions. Considering the “multiverse” forces you to consider what your assumptions were (since often they are implicit) and the extent to which they are justifiable in any particular context. But rather than see that as a reason to recommend that people do a whole bunch of analyses on the same data, I see that as a reason to make the assumptions of any single analysis more explicit.

For another thing, the “multiverse” idea extends beyond analysis to include design and measurement. The conclusions we would like to draw are with respect to general mechanisms or theoretical constructs (like “mass”, “preference”, “socioeconomic status”, etc.). But there are always multiple ways to operationalize/measure those constructs, different ways of experimentally intervening on them, and different ways of sampling. A true “multiverse” would include every possible measurement, sampling process, and experimental intervention. So on the one hand, I think a “multiverse” perspective can be useful regarding those issues just like it can regarding analysis–it forces us to consider what implicit assumptions we have made and what other options are available. But on the other hand, it points to the impracticality of the multiverse as a procedure.

Reply ↓
- AAAnonymous on February 13, 2026 8:36 AM at 8:36 am said:
  
  Quote from above: “For another thing, the “multiverse” idea extends beyond analysis to include design and measurement.”
  
  And might it also involve including not only every possible measurement, sampling process, experimental intervention but also every possible scientist. Imagine a world, or universe, where not a small group of people but every single person performs research!
  
  Or, perhaps that’s only a conversation to be had with someone at a coffee shop…
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

What a multiverse good for anyway?

4 thoughts on “What a multiverse good for anyway?”

Leave a Reply Cancel reply