“The Multiverse of Methods: Extending the Multiverse Analysis to Address Data-Collection Decisions”

Jenna Harder writes:

When analyzing data, researchers may have multiple reasonable options for the many decisions they must make about the data—for example, how to code a variable or which participants to exclude. Therefore, there exists a multiverse of possible data sets. A classic multiverse analysis involves performing a given analysis on every potential data set in this multiverse to examine how each data decision affects the results. However, a limitation of the multiverse analysis is that it addresses only data cleaning and analytic decisions, yet researcher decisions that affect results also happen at the data-collection stage. I propose an adaptation of the multiverse method in which the multiverse of data sets is composed of real data sets from studies varying in data-collection methods of interest. I walk through an example analysis applying the approach to 19 studies on shooting decisions to demonstrate the usefulness of this approach and conclude with a further discussion of the limitations and applications of this method.

I like this because of the term “classic multiverse analysis.” It’s fun to be a classic!

In all seriousness, I like the multiverse idea, and ideally it should be thought of as a step toward a multilevel model, in the same way that the secret weapon is both a visualization tool and an implicit multilevel model.

I’m also glad that Perspectives on Psychological Science has decided to publish research again, and not just publish lies about people. That’s cool. I guess an apology will never be coming, but at least they’ve moved on.

2 thoughts on ““The Multiverse of Methods: Extending the Multiverse Analysis to Address Data-Collection Decisions”

  1. The ‘multiverse’ is extensible beyond simple variable recodings and transformations. While this may be useful in carefully designed experimental and hypothesis testing situations, it has less utility in applied settings.

    Why not extend the metrics used in evaluating models from the conventional single metric of predictive accuracy — MSE, RMSE, MAD, etc. — to multiple metrics? A cloud of metrics including the conventional metrics concerned with accuracy but also metrics of dependence, linear and nonlinear — Pearson, Spearman, Hoeffding’s D, distance correlations, entropy, mutual information, and more.

    In addition to multiple metrics, why not leverage multiple models that go beyond the conventional single, specified model functional form to evaluate data and implement models of varying functional forms? All of which would tell different stories about the information.

    Just saying.

  2. Funny, your previous post (re: COVID vs. Spanish flu) inspired a thought related to this post. My thought was that there are (at least) two types of forks in data analysis paths: one type is equivalent to deciding or changing your research question after looking at the data (this is bad), while the other involves making one set of analytical choices rather than another set, without impacting the construct-level research question (this is unavoidable).

    I was wondering whether you (or anyone) have tried to quantify the impact of the second type of fork as literal non-sampling error, and this post partially answers that question. Your “classic” version seems more qualitative and post hoc, rather than intended for estimating standard error. I’m not sure many people would adopt a quantitative version of the procedure in practice (it would invariably increase uncertainty intervals), other than as a factor to examine in meta-analysis (analyses with more forks ought to have more variance, all else being equal). But as a thought experiment, I think it provides insight into why strong declarations of conclusions are usually not justified for a single study.

Leave a Reply

Your email address will not be published. Required fields are marked *