Bayesian workflow for disease transmission modeling in Stan!

Léo Grinsztajn, Elizaveta Semenova, Charles Margossian, and Julien Riou write:

This tutorial shows how to build, fit, and criticize disease transmission models in Stan, and should be useful to researchers interested in modeling the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic and other infectious diseases in a Bayesian framework. Bayesian modeling provides a principled way to quantify uncertainty and incorporate both data and prior knowledge into the model estimates. Stan is an expressive probabilistic programming language that abstracts the inference and allows users to focus on the modeling. As a result, Stan code is readable and easily extensible, which makes the modeler’s work more transparent. Furthermore, Stan’s main inference engine, Hamiltonian Monte Carlo sampling, is amiable to diagnostics, which means the user can verify whether the obtained inference is reliable. In this tutorial, we demonstrate how to formulate, fit, and diagnose a compartmental transmission model in Stan, first with a simple susceptible-infected-recovered model, then with a more elaborate transmission model used during the SARS-CoV-2 pandemic. We also cover advanced topics which can further help practitioners fit sophisticated models; notably, how to use simulations to probe the model and priors, and computational techniques to scale-up models based on ordinary differential equations.

I love this workflow stuff!

22 thoughts on “Bayesian workflow for disease transmission modeling in Stan!

  1. I had to google what Stan is. In the Wikipedia page, it says that Andrew Gelman is one of the inventors of the programming language. So, Andrew, why use Stan, when there are so many programming languages and so many libraries that, presumably, work similarly?

    • One key reason behind “why Stan” is that you have to understand that the Stan project was started something like 2010 or 2011, and first released in 2012. At the time there were no “programming languages and … libraries that… work similarly”.

      in fact much of the reason you have so many “similar” projects these days is BECAUSE they started working on Stan back then (and inspired a lot of things).

      To give an example, Turing.jl is a system built entirely within Julia which offers a modeling language and a sampling engine, which gives you very similar functionality. In general it’s maybe not quite as high performance, but it does offer much more functionality, including for example mixed sampling, so you can sample discrete parameters with Gibbs sampling, and continuous ones with Hamiltonian Monte Carlo (using the NUTS algorithm) as well as the full power of the Julia language, including what is undoubtedly the best Differential Equations solving system in the world.

      But it probably wouldn’t exist without Stan as the impetus.

        • I see. But having used both I’m not clear on why that is an advantage. I basically treated them as the same thing with different syntax.

        • Then you have spared yourself the pleasure of trying to extend WinBUGS with beautiful WBDev scripts or the mental gymnastics of the ones- and zeroes-tricks that valiantly circumvent BUGS syntax to express models beyond the imagination of its developers.

          JAGS was/is considerably easier to extend in C++, but is still very limited in its expressive ability as a language.

        • It just depends on the type of models you want to deal with. For example if you want to have ODEs or simulations of agents, or to solve algebraic equations or do some kind of optimization, you can do those within the model in Stan.

        • You can do all of that in many of the other current systems. Daniel Lee and I built our first ODE solvers just by autodiffing through 4th/5th order Runge-Kutta from Boost. Since then, we’ve implemented proper sensitivities and are currently in the middle of replacing our sensitivities with more efficient adjoint methods. From what I’ve heard, Julia’s now ahead of us in the ODE solver department with Turing.jl—they certainly have a much wider range of solvers implemented.

          The really big problem with BUGS and JAGS is the reliance on Gibbs sampling and the reliance on conjugacy for multivariate nodes and the lack of extensibility. BUGS and JAGS still have an advantage in flexibility in allowing discrete sampling. PyMC3 is nice in that it allows discrete and continuous sampling to alternate. The problem is finding problems where discrete sampling will work. It’s much better to marginalize for efficiency when possible. But it’s hard on our users and leads to clunky looking code compared to just having discrete random variables.

        • In current systems, yes, but in 2011 or so JAGS and BUGS were available, and doing these things was impossible in those systems. There were a few other general purposes systems as well. Like in R you could use Geyer’s mcmc package. that’s how I did the fitting of ODEs in my dissertation, because Stan didn’t exist and neither did Julia.

        • Are you saying it’d be possible to do optimization as a part of how the model is defined in Stan? I mean something like this:

          function minimand(x, somedata){

          }

          and then in the model block use this as a part of the model for example like this:

          y ~ normal(mu, optimize(minimand, somedata = somedata))

          Here, mu would be a parameter but the standard deviation of the normal distribution would be found through minimizing the function defined earlier.

          Well, this is a silly example, I couldn’t come up with anything better, but I hope it communicates the idea.

          I was trying to find this from Stan’s manual the other day, but I didn’t catch it.

        • I was still useless babby no compute how back in the days of BUGS and JAGS, but speculating from what I read, it sounds like both products had a tendency to silently give the wrong answer when you did something unusual. It seems like they were built as something one step up from grad student projects rather than with the principled, production-ready correctness-and-diagnostics-first philosophy of Stan.

        • I believe you completely, but playing around with pretty much anything before Stan and even many packages since, most of them feel more like cool proofs of concept or prototypes rather than something people are actually expected to use. At the risk of being mean, but to avoid being vague, the recent stuff I’m thinking about are edward and and especially tensorflow-probability. I honestly can’t imagine using tensorflow-probability to build anything complicated, and for some reason I have very little confidence in its algorithmic correctness either. (Though frankly, even basic tensorflow for the problem it’s designed for is pretty awkward, so it’s not just the probabilistic functionality).

          Also, I can’t imagine tackling applied problems and trying to make sure the distributions are conjugate. That just feels like computation contaminating the correctness of the model, which of course it does anyways even with metropolis based methods, but much more constraining. Choosing a prior distribution to make conjugacy work out just feels wrong.

          some of my Ph.D. students in political science were using Bugs to
          fit their models. It turned out that Bugs’s modeling language was ideal for students who wanted to fit
          complex models but didn’t have a full grasp of the mathematics of likelihood functions, let alone
          Bayesian inference and integration.

          Did bayesian social scientists used to be expected to write their own samplers? I can’t imagine writing a thesis in an entire other field while also having to think about the numerical properties of 4th order runge-kutta and symplectic integrators as a footnote. Of course I guess you did political science and computational statistics, but I assumed that was a pretty unusual case.

        • @somebody: I’m glad you like Stan. We’ve worked hard on things like error messages, correctness, and clean documentation. They’ve definitely matured over time.

          On the Python autodiff side, I’d suggest JAX. It’s much cleaner than TensorFlow. There are some nice tools like Oryx for transforms and TensorFlow Probability can be layered over it. If JAX had existed when we wrote the Stan math library autodiff, we wouldn’t have bothered developing our own autodiff.

          Julia’s autodiff is pretty clean, too, which makes embeddings like Turing.jl nice to use.

          BUGS doesn’t require conjugate priors in the univariate case, but it’s much more efficient with conjugate priors so you can do exact conditional sampling in Gibbs rather than the adaptive rejection sampler BUGS uses or slice sampler that JAGS used. The advantage of conjugate priors throughout is that models are easy to update with new data online. With non-conjugate priors, you either have to approximate or re-run everything.

          Yes, Bayesian social scientists used to write their own samplers. They still do. For example, check out Gill’s Bayesian Methods book. Those were mainly Gibbs or random-walk Metropolis samplers, not HMC. Andrew wrote his own samplers, but he has a Ph.D. in stats. Before we developed Stan, Andrew had a postdoc, Matt Schofield, who wrote an HMC sampler for tree ring data, but again, that was a stats postdoc, not a social scientist.

    • As Daniel Lakeland commented, there wasn’t anything that did what we wanted at the time (2010).

      BUGS was decades ahead of its time in the 1990s. But it’s coded in objective Pascal and not easily extensible. JAGS is a more modern reimplementation that replaces delayed rejection sampling with slice sampling for non-conjugate univariate nodes. Matt Hoffman and I started by thinking about how we could make JAGS go faster. Not hard, because it’s implemented using R primitives very wastefully in terms of memory and there wasn’t any vectorized code at the time other than some GLM specializations (JAGS is still improving). The problem is that faster Gibbs isn’t what we need, because it doesn’t scale well in dimension or with correlated posteriors.

      I designed the Stan language based on BUGS and JAGS. The initial trick was figuring out how to interpret the sampling statements as log density increments. The second big trick was unconstraining transforms for all of our constrained parameters, so we could use an unconstrained form of Hamiltonian Monte Carlo. I found BUGS and JAGS programs hard to read, so I introduced static typing and separated the data, parameter, and model definitions into blocks for ease of reading. Once we did that, it was clear that we had a full programming language and could have local variables, user-defined functions, etc., and we started playing around with the flexibility that gives us over the way BUGS/JAGS constrain everything to stochastic and deterministic nodes in their graphical model, which is hugely wasteful in memory.

      While I was developing the language, Matt Hoffman made the major breakthrough of figuring out how to auto-tune Hamiltonian Monte Carlo (HMC) with the no-U-turn sampler (NUTS). HMC is what makes Stan so much more efficient statistically for most models than BUGS or JAGS and NUTS. Careful coding is what makes it so much tighter in memory (we spend a lot of time profiling and looking at assembly language being generated by our low-level C++ code). If you were to ask me what’s still going to be left of contributions made during the Stan project, I think NUTS and the LKJ prior that Ben Goodrich developed for correlation matrices and the basic approach of ADVI that Alp Kucukelbir developed are going to be with us long after the language is replaced with something better.

      The only other system that existed when we started was Autodiff Model Builder (ADMB), but we didn’t find that until much later. ADMB is closer in design to Stan in its basis in autodiff, but it’s more of a scripting design than a full language. ADMB is still very popular in fisheries and wildlife research and is still used by NOAA in production; we’ve helped them upgrade to the no-U-turn sampler.

      If something like JAX had existed when we started, we wouldn’t have tried building our own autodiff. We did think we could do better for our use case than the C++ autodiff tools that existed when we started, like Sacado, CppAD, and Adol-C.

Leave a Reply

Your email address will not be published. Required fields are marked *