Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Paul-Christian Bürkner, Lauren Kennedy, Jonah Gabry, Martin Modrák, and I write:

The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using these models, along with many remaining challenges in computation. Using Bayesian inference to solve real-world problems requires not only statistical skills, subject matter knowledge, and programming, but also awareness of the decisions made in the process of data analysis. All of these aspects can be understood as part of a tangled workflow of applied Bayesian statistics. Beyond inference, the workflow also includes iterative model building, model checking, validation and troubleshooting of computational problems, model understanding, and model comparison. We review all these aspects of workflow in the context of several examples, keeping in mind that in practice we will be fitting many models for any given problem, even if only a subset of them will ultimately be relevant for our conclusions.

This is a long article (77 pages! So long it has its own table of contents!) because we had a lot that we wanted to say. We were thinking about some of these ideas a few months ago, and a few years earlier, and a few years before that. Our take on workflow follows a long tradition of applied Bayesian model building, going back to Mosteller and Wallace (and probably to Laplace before that), and it also relates to S and R and the tidyverse and other statistical computing environments. We’re trying to take ideas of good statistical practice and bring them into the tent, as it were, of statistical methodology.

We see three benefits to this research program. First, by making explicit various aspects of what we consider to be good practice, we can open the door to further developments, in the same way that the explicit acknowledgment of “exploratory data analysis” led to improved methods for data exploration, and in the same way that formalizing the idea of “hierarchical models” (instead of considering various tricks for estimating the prior distribution from the data) has led to more sophisticated multilevel models. Second, laying out a workflow is the a step toward automation of these important steps of statistical analysis. Third, we would like our computational tools to work well with real workflows, handling the multiplicity of models that we fit in any serious applied project. We consider this Bayesian Workflow article to be a step in these directions.

Maybe tangentially related – I was wondering if you might have some comments to offer:

-snip-

We applied an ensemble of 16 Bayesian models to vital statistics data to estimate the all-cause mortality effect of the pandemic for 21 industrialized countries.

https://www.nature.com/articles/s41591-020-1112-0

Can you explain what this means?

In section 5.4

> transform the gradient of the parameters to the gradient of the expected

data

In reading the paper, I tried to think of a catchy by thoughtful description.

Came up with – Purpose Informed and Economical Empirical Inquiry – PIE-EI

Not bad.

Looks like a good set of notes for the last 2/3 of a graduate-level course.

This is an enjoyable read. To me, it’s more a methodology than a workflow. More impressive that way, and most of your criticisms are about either inappropriate or erroneous methodologies, where inappropriate includes ranges from explicit mucking through or with data running to implicit mucking which becomes error.

I enjoy the discussion, like when you talk about generative models, it reads as practitionally practiced.

Thanks!

Can someone define “practitionally’ as used in Jonathan’s last sentence? (A web search wasn’t helpful.)

Well, practitionally must be the adverb form of practitional.

Practitional is a rare word dating from the 17th century that means practical, according to lexico.com.

So the sense would be “as practiced in the real world.”

Alternatively, practitionally could be a neologism based on the word practitioner.

In which case the sense would be “as practiced by a skilled expert.”

Or we could just merge all the meanings and come up with my preferred translation:

“it practically reads the way practical practitioners would practice practicing.”

Ya coulda done better, say with

“it practically proceeds per practical practitioners’ proposed procedure of practicing.”

I can get in another p-word ;-)

“it practically proceeds per practical practitioners’ proposed procedure of _purposeful_ practicing.”

This article may also be of interest to readers. The debt to Michael Betancourt’s work should be obvious. The paper was written after Michael taught a course on Bayesian methods at Potsdam (Potsdam, Germany, not Potsdam, New York). We wanted to give a practical example that “Cognitive Scientists” like myself can use.

Daniel J. Schad, Michael Betancourt, and Shravan Vasishth. Towards a principled Bayesian workflow: A tutorial for cognitive science. Psychological Methods, 2020. In Press

The paper must be paywalled; I think I don’t have the right to get a legal copy as it is an APA-controlled journal. But we do have all the code and data under our control: https://osf.io/b2vx9/

Shravan:

Thanks. It’s good to see applications of these ideas to particular research areas.

Each time we write about the topic we get a slightly different focus. When I gave that talk in 2009 and 2011 (the last link in the above post), I was focused on the common structure underlying posterior predictive model checking, simulation-based fake-data checking, and model building: all can be viewed as extensions, in different ways, of the “graphical model” or conditional independence structure that traditionally has been set up for a single model. The idea is to expand the joint distribution beyond p(y,theta) to include y_rep, theta_rep, y_fake, and parameters in different models. At the time, I thought the topic was very important but I couldn’t figure out a good way to write it up. Then in 2017, with Jonah Gabry and others we expressed some of these ideas in the context of statistical graphics and visualization; this ultimately became the Gabry et al. (2019) discussion paper, Visualization in Bayesian workflow, which we referred to in our article. Bayesian workflow is about much more than visualization, but this gave us an entry point. In your paper, cognitive science is your entry point. In our recent Bayesian Workflow paper, our entry point is computing. It would’ve been hard for us to write this article back in 2009 because at that time we were not thinking about having a unified computing environment. Indeed, I’ve been talking about fake data simulation for a long time but only recently has it fully entered my workflow.

Professor Gelman,

Thank you for the great paper! I have been following your (and Aki’s work) on Bayesian workflow for a while and this paper seems to gather a lot of the research together. I was wondering: 1) What would change in the workflow if, instead of “only” doing statistical inference you would also like to do causal inference? 2) If the changes are big, are you aware of a paper similar to this, that tackles the problem of “causal inference” workflow?

Thank you!

I think causality is implicit in (good) fake data simulation in that all processes that lead to the data coming about (and in your possession) need to be fully specified in probability models and constraints.

However, probably would be better to make it explicit as is done here – Greenland. Causal foundations for probability in statistics. 2020. https://arxiv.org/ftp/arxiv/papers/2011/2011.02677.pdf

Thank you! Will definitely read after I finish the Bayesian workflow one.

I never understood the term “fake data simulation.” It’s not “real,” measured from some biological or social process, whatever. But it’s not “fake”. It’s simulated data. I don’t think “fake” and “real” are mutually exclusive.

In the psychiatry literature, I noticed since 2018 or so, they’re now using “psychopathology,” to refer to any deviation from normal neural development. It makes sense if you break down the word. Psycho – something about the mind. Pathology – something to do with disease. So disorders like schizophrenia, ADHD, whatever. They’re all psychopathologies. So this is a good change.

No, psychopath’s aren’t all Patrick Bateman in American Psycho.

But we’re writing for other people, not ourselves. If we’re not careful with the way we define things, this can propagate pseudo-science, misinformation, etc. It’s our fault as the scientists.

It’s not the layman’s fault for “not working hard enough,” “not getting enough degrees,” etc. It’s our fault because we’re not making our information easily accessible by everyone.

The purpose of writing is to be clear, and easily understood, and easily accessible. Things should be short, when possible.

It’s not “fake data simulation,” it’s simulated data.

I agree; I have started calling it simulated data. The word fake seems very charged and too informal. I’ve had people ask me why I am faking data.

Andrew, Thank you for such a great guide to Bayesian Data Analysis. This work is a fine progression from “Visualization in Bayesian Workflow”, Gabry et al.; and it augments well Michael Betancourt’s consulting advice and writings, from which I have benefitted greatly.

Even as I continue to read through and digest it, I’ve already used the broad strokes of the “Bayesian Workflow”, as shown in your Figure 1, in discussions with graduate students. Here’s what I presented (as a tip of the iceberg) just yesterday:

see the PDF of slides. Most notably, I’ve refashioned your Figure 1 as a flowchart on page 3 of the PDF.

Finally, (picking nits) I think the phrase a “tangled workflow” doesn’t do justice to what is quite often a systematic progression through the workflow activities. I understand one may jump from box to box without necessarily following the arrows, but I think the canonical workflow is quite straight-forward and helps to build discipline into one’s practice. Sure, as with any workflow or methodology, experts take shortcuts and know when it’s promising to jump around, reordering the steps. Yet, there is great value in teaching the canonical workflow to students — mind you with ample discussion to avoid getting mindlessly cookbook-y.

Looking forward to the evolution of this “workflow” into a “method” and beyond!

Thank you for this very interesting article. It helps to see the material presented in this fashion and affirms my ideas on that I should go back to really understand my model better.

I very much like the discussion on simulation-based model calibration. For the projects I have worked with, it is one of the most challenging aspects as I have worked on high dimensional problems with considerable costs for the likelihood evaluation. I have never read up more about it, but I would suspect that calculations involved e.g. in the discovery of the Higgs should be i) rather important to calibrate, but ii) incredibly hard (computationally expensive) to check. Maybe they have done some work on how to do SBC with a small number of draws.

—-

PS:

I’m sure other people have pointed this out already, but in case you did not notice, I had following remarks:

p. 9: Words missing after “by the proportion of volume that the liver”

p 18: Could you elaborate on following statement: “Bayesian inference will in general only be calibrated when

averaging over the prior, not for any single parameter value”

Fig 9: Figure title of the right panel: Un/b/alanced

Fig. 12: The /dashed/ (not dotted?) line is barely visible, especially in the printed version. Maybe you could use a different color code and/or a dashed line for the “true” model.