Multiverse R package

This is Jessica. Abhraneel Sarma, Alex Kale, Michael Moon, Nathan Taback, Fanny Chevalier, Matt Kay, and I write,

There are myriad ways to analyse a dataset. But which one to trust? In the face of such uncertainty, analysts may adopt multiverse analysis: running all reasonable analyses on the dataset. Yet this is cognitively and technically difficult with existing tools—how does one specify and execute all combinations of reasonable analyses of a dataset?—and often requires discarding existing workflows. We present multiverse, a tool for implementing multiverse analyses in R with expressive syntax supporting existing computational notebook workflows. multiverse supports building up a multiverse through local changes to a single analysis and optimises execution by pruning redundant computations. We evaluate how multiverse supports programming multiverse analyses using (a) principles of cognitive ergonomics to compare with two existing multiverse tools; and (b) case studies based on semi-structured interviews with researchers who have successfully implemented an end-to-end analysis using multiverse. We identify design tradeoffs (e.g. increased flexibility versus learnability), and suggest future directions for multiverse tool design.

Here it is on CRAN. And here’s the github repo.

A challenge in conducting multiverse analysis is that you have to write your code to branch over any decision points where there is uncertainty about the right choice. This means identifying and specifying dependencies between paths, such as cases where running one particular model specification requires one particular definition of a variable. Relying on standard imperative programming solutions like for loops leads to messy error-prone code which is hard to debug, run, and interpret later. Additionally, depending on how the code executed, an analyst might have to wait until the entire multiverse has executed before they can discover errors with some paths. This makes debugging slower. 

There are a few existing tools for specifying a multiverse, but this package lets the author build things up from a single analysis (which seemed more realistic to us than expecting them to start from the omniscient view of the entire multiverse), and it interfaces with the sort of iterative workflow one might expect in computational notebooks. Executing a multiverse is optimized over the case where you compute every single path separately by sharing information about results among related subpaths. Immediate feedback is provided on a default analysis which the author can control. 

As the abstract describes, the package has seen a little use so far (including for a virtual-reality related multiverse with millions of paths), so we think this kind of design pattern has some promise. 

PS: See my collaborators’ other work on interactive papers for communicating multiverse results and visualization approaches. And stay tuned for more work led by Abhraneel on interactive visualization to probe results of a multiverse analysis.

18 thoughts on “Multiverse R package

    • Rahul, I don’t get your question. A Bayesian analysis is one analysis. Here, they are doing multiple analyses under different assumptions. E.g., if one codes female hurricances as 1 and other hurricances as 0, then that’s one analysis, Bayesian or not is irrelevant. If one codes female on a continuum (as a continuous value), that’s another analysis, Bayesian or not. It’s not the prior specification that is driving the outcome here, things can change quite a bit (especially if one is fixated on statistical significance).

    • If you have a rich enough hypothesis space, I suppose it could be the same thing and just becomes a model-uncertainty exercise, but you are going to have a very hard time making your Stan or JAGS code do the discrete jumps over every possible analytic choice like what data transforms to run or what subsets to drop, not to mention all of the variables or different kinds of models or outcomes. Much easier to just define the pipeline with all of the choices at each stage and run the standard analysis and aggregate the point estimates for your multiverse.

  1. I was curious about the package and installed it and then looked at the vignette(“visualising-multiverse”) output. The animate plot is interesting. You could incorporate a design analysis, Gelman and Carlin 2014, to summarize the prospective power and Type M error properties of each decision. For me, that’s always more interesting than whether the confidence interval crosses zero, which is what people will focus on in the animate plot. Maybe you do that already and I missed it in the github repo example.

    Although to be honest, if I were doing such an analysis, I would want to do it by hand. What we do in practice is pre-register the analysis, and report any exploratory analyses in a separate section.

    • Interesting idea. We’d need an estimate of the true effect size, but that would allow for some quantitative comparison between results beyond what the researcher intended. Abhraneel, Matt and I have some current work aimed at consumers of multiverse analyses where these kinds of things could be helpful.

      On wanting to do it by hand on a single analysis with the rest as exploratory in a different section … this way of thinking about sensitivity analysis is why I like the design pattern we use here of starting with the single analysis. Though I suspect many researchers would still be nervous to try something like this in practice because it requires being comfortable with jumping up to a more critical, “meta” level to define and communicate your own uncertainty as you do the analysis. I hope that becomes more standard, but it will mean needing to get more comfortable with telling stories about uncertainty.

  2. The package looks nice and useful!

    I do have a minor rant on presentation of multiverse results. In many cases, multiverse results are reported as if all the individual models were exchangeable. That’s IMHO both highly implausible and opens door for misleading interpretations – I can make the overall distribution of p-values/effects/… look very different by choice of which models to include, or how many specific values I choose to use for a continuous parameter.

    So I think the authors should make it possible for the reader to actually map the individual estimates to their corresponding models and thus be able to make judgements on the plausiblity of the model specifications that yielded them (the original Steegen et al. paper succeeds in that). The specification curve approach highlighted in the package vignette goes IMHO in the right direction, but this goal is generally hard to achieve if the multiverse is too big (and the results indeed change substantially based on modelling choices). But I think that especially when the multiverse is big, it is a duty of the researcher not to just report all the values, but to also try to extract some understanding/abstraction of the results (e.g. which components seem to have big influence etc.).

    The main idea is from a blog by Julia Rohrer (http://www.the100.ci/2021/03/07/mulltiverse-analysis/) which I recommend. A Twitter thread with some other options I’ve tried to visualise the results: https://twitter.com/modrak_m/status/1368542685175382018

  3. I always thought of the “multiverse” concept as more of a reductio ad absurdum example.

    This package cannot address the main problem, which is that the analyst typically does not have data on all the important variables. Or often even know what those variables are.

    And when we do have all the important variables, it is because the type of data collected was informed by a rationally derived model. In that case we aren’t using these kinds of default statistical models anyway.

    • I don’t completely disagree (I vaguely recall Andrew once mentioning to me that when they wrote the original multiverse paper, they didn’t imagine people actually doing it!). It may be that the information that the analyst doesn’t have access to or can’t concieve of is where most of the value of multiplexing would be. Though I still see some value in multiplexing over decisions points like data filtering, transformations or variations on a model specification which remain of uncertain value for a given dataset, as a way of bringing modeling assumptions that often remain hidden to the surface. At a higher level, eliminating implementation hurdles through tools like this makes it easier to study how hard the conceptual hurdles are to address in practice.

  4. Adoption of tools like this would be superior to current standards, since the uncertainty reported for the coefficients will only be increased (and thus approach the correct answer).

    But it seems like a lot of wasted computation if we already know analytically that (if the entire multiverse of models was truly explored*) the uncertainty will grow to the point where we have nothing useful to say about the value of interest.

    Ie, if the model is determined by the data availabile rather than vice versa, we can safely assume the coefficients are arbitrary.

    * Think of how many thousands or millions of models you can generate with symbolic regression in minutes.

    • >But it seems like a lot of wasted computation if we already know analytically that (if the entire multiverse of models was truly explored*) the uncertainty will grow to the point where we have nothing useful to say about the value of interest.

      I’m trying to understand what you mean … are you assuming that the analyst has no ability to say, a priori of obtaining data, what sorts of models are more appropriate versus less appropriate?

      • I’m saying the number of equally plausible statistical models is effectively infinite. Eg,

        Although more than 600 million specifications might seem high, this number is best understood in relation to the total possible itera-tions of dependent (six analysis options) and independent variables (224 +225 – 2 analysis options) and whether co-variates are included (two analysis options). The number rises even higher, to 2.5 tril-lion specifications, for the MCS if any combination of co-variates (212 analysis options) is included.

        […]

        Because we are examining something inherently complex, the like-lihood of unaccounted factors affecting both technology use and well-being is high. It is therefore possible that the associations we document, and those that previous authors have documented, are spurious.

        For the sake of simplicity and comparison, simple linear regressions were used in this study, overlooking the fact that the relationship of interest is probably more complex, non-linear or hierarchical13.

        https://www.nature.com/articles/s41562-018-0506-1

        Plug one of your datasets into some symbolic regression software and observe how many models get generated.

        When we derive the model from some assumptions (ie, the theory/explanation), then go seek the data needed to constrain our parameters, it is a very different situation.

        • The number of equally plausible models is probably much smaller than most analyses leads you to believe

          In your example simulation you include 4 biomarkers, genotype, pain, age, and fatigue to predict depression via a DAG. Since it is your simulation, you know that what generated the depression score. But in real life, there would be other possible biomarkers, recent life events (eg, death of a loved one), and all of these would also dynamically influence each other over time. Whatever the true model is, it would look nothing like that DAG. Even if you do some principled pruning, the multiverse of possibilities is still huge.

          Think about a case where we really do know the true model. The volume of a box is L*W*H, but pretend we didn’t know that.

          Could you get to that answer via trying out regression/DAG models? There are all sorts of predictors you can throw in that will be correlated with the volume: color, material, temperature, mass, gps coordinates, date of construction, velocity, whatever. And the L, W, H are all measured with some error and correlated with each other. Further the answer isn’t a linear model like a*L + b*W + c*H.

        • I can’t tell if that is sincere or sarcastic, but I will stick with that we can come up with an effectively infinite set of models to fit a given dataset. Each of those models, in turn, can map to multiple possible theories (sets of assumptions).

          However, there are fewer mappings of plausible assumptions (some of which can hopefully can be checked with other types of data) to models… than from data to models. So instead of only doing:

          Abduction: data -> model -> theory (set of assumptions)

          We also need to do:

          Deduction: theory (set of assumptions) -> model -> data

          Both steps are necessary. The value of the deduction step is that there is one model per theory.* But we always need to remember affirming the consequent, there can be multiple theories that map to the same models.

          * for a given type of data, another advantage is there are typically a number of different predictions we can derive from a single theory

Leave a Reply

Your email address will not be published. Required fields are marked *