Supporting Bayesian modeling workflows with iterative filtering for multiverse analysis

Anna Riha, Nikolas Siccha, Antti Oulasvirta, and Aki Vehtari write:

When building statistical models for Bayesian data analysis tasks, required and optional iterative adjustments and different modelling choices can give rise to numerous candidate models. In particular, checks and evaluations throughout the modelling process can motivate changes to an existing model or the consideration of alternative models to ultimately obtain models of sufficient quality for the problem at hand. Additionally, failing to consider alternative models can lead to overconfidence in the predictive or inferential ability of a chosen model. The search for suitable models requires modellers to work with multiple models without jeopardising the validity of their results. Multiverse analysis offers a framework for transparent creation of multiple models at once based on different sensible modelling choices, but the number of candidate models arising in the combination of iterations and possible modelling choices can become overwhelming in practice. Motivated by these challenges, this work proposes iterative filtering for multiverse analysis to support efficient and consistent assessment of multiple models and meaningful filtering towards fewer models of higher quality across different modelling contexts. Given that causal constraints have been taken into account, we show how multiverse analysis can be combined with recommendations from established Bayesian modelling workflows to identify promising candidate models by assessing predictive abilities and, if needed, tending to computational issues. We illustrate our suggested approach in different realistic modelling scenarios using real data examples.

They’re just getting started! Lots more needs to be done. I’ve been interested in the general idea for awhile; the challenge is to get it working for some good examples and then to develop more general tools and abstract more general principles. As Riha et al. demonstrate, it can help to work in the directions of modeling and computation at the same time.

9 thoughts on “Supporting Bayesian modeling workflows with iterative filtering for multiverse analysis

  1. The rigorous way to address the “multiverse” issue is to make surprising predictions. Not “postdictions” about data already available, about the future. And not vague predictions or stuff expected to happen anyway (“it will rain somewhere on Earth next year”).

    Like no one really trusted newtonian mechanics until Haleys comet returned very near the expected date. It was technically a wrong prediction but no other model even came close.

    These proxy methods may work great for filtering the “multiverse”, but the proponents need to compare it to that gold standard before its worth looking into. At the very least backtest it against s few historical examples.

    Its just like independent replication. People seem willing to do anything to avoid it.

    * Thats also the core of Lakatos’ philosophy and its what Bayes rule says to do, for anyone looking for theoretical justification.

    • Anoneuid,

      I agree that this multiverse approach complicates a conceptually simple procedure to do science: Think of a scientific model, estimate it, see if predictions match reality. Straight forward!

      Can you explain what Bayes rule has to do with it?

      • Suppose you have three models, two of which are kind of popular, and one of which is your fringe idea. Prior probabilities are 45,45,10%

        Now, your fringe 10% idea nearly exactly predicts something that happens next month. While the two popular models give near 0 probability for that thing. Posterior will be prior times likelihood, renormalized.

        Fringe idea says data is near 1 probability. Popular models say near 0. So posterior probability for your fringe model will be much much closer to 1.

        Predicting something as very likely to occur when alternative models say very unlikely is therefore a way to make your model stand out as the one that is successful. Don’t waste time on issues where all models predict similar probabilities.

        • I would add how it deals with the precision of the various models. Say for a coin flip there are three models: fair (F) double-heads (H), double-tails (T).

          If we observe (O) the flip to be tails and had assigned equal priors:

          p(F|O) = 0.5/(0.5 + 1 + 0) = 1/3
          p(H|O) = 0.0/(0.5 + 1 + 0) = 0
          p(T|O) = 1.0/(0.5 + 1 + 0) = 2/3

          The “riskier” model is rewarded more for predicting the right thing. This is the real reason not concern ourselves much with vague god/aliens explanations that spread the probability mass out over any concievable observations.

          Its also why people in fields like the social sciences and medicine are so touchy about “woo”. They know there is little distinguishing their current vague theories from the woo.

        • Also see:

          Paul Meehl. “The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions” In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What If There Were No Significance Tests? (pp. 393–425) Mahwah, NJ : Erlbaum, 1997.
          https://meehl.umn.edu/sites/meehl.umn.edu/files/files/169problemisepistemology.pdf

          He has another good paper on this, but I’m not recalling the name at the moment. He also doesn’t mention Bayes rule iirc, but that is essentially what he is talking about.

  2. This work is interesting, but as a practitioner, I’m having a hard time thinking about how it would fit into real workflows I’ve had.

    One one hand, I’ve worked problems where “obvious” model variations presented themselves and it made sense to evaluate them, such as a Poisson vs Negative Binomial distribution for count data. My reading is that the “multiverse” version of this would be to pre-specify these model variations and evaluate them simultaneously. Fine, I suppose – I can see instances where that would be useful.

    Maybe this is a limitation of myself as a modeler, but I don’t really see myself coming up with very many model variations at the initial stages? Part of it, I think, is just feeling like I can only have so many different variations in my head at one time. At some point, I need to see the resulting fits to direct my thinking, especially when considering more substantial changes to the model.

    It’s also the case that in some more complex models, the problem areas were non-obvious before the fit and required careful evaluations to find them (and were issues that I did not predict beforehand, again maybe a me-issue).

    Perhaps the intention here then is to eventually look towards automating across these combinatorially large number of comparisons? To me it seems that would have a tendency to avoid some careful thought about models of interest?

  3. This is not meant as a snarky question – it is good that the approach is transparent but it still strikes me it begins to verge on the Garden of Forking Paths or p-hacking. Sort of try many models until you find one. I am not an expert on this so any further insights people could provide would be appreciated.

    • Roy:

      Forking paths is a good thing! What’s bad is reporting only the best result from many paths. That’s bad partly because it’s misleading (for example, with p-value) and also because it discards information (from all the other paths).

Leave a Reply

Your email address will not be published. Required fields are marked *