Bayesian workflow for disease transmission modeling in Stan!

Posted on September 9, 2021 9:36 AM by Andrew

Léo Grinsztajn, Elizaveta Semenova, Charles Margossian, and Julien Riou write:

This tutorial shows how to build, fit, and criticize disease transmission models in Stan, and should be useful to researchers interested in modeling the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic and other infectious diseases in a Bayesian framework. Bayesian modeling provides a principled way to quantify uncertainty and incorporate both data and prior knowledge into the model estimates. Stan is an expressive probabilistic programming language that abstracts the inference and allows users to focus on the modeling. As a result, Stan code is readable and easily extensible, which makes the modeler’s work more transparent. Furthermore, Stan’s main inference engine, Hamiltonian Monte Carlo sampling, is amiable to diagnostics, which means the user can verify whether the obtained inference is reliable. In this tutorial, we demonstrate how to formulate, fit, and diagnose a compartmental transmission model in Stan, first with a simple susceptible-infected-recovered model, then with a more elaborate transmission model used during the SARS-CoV-2 pandemic. We also cover advanced topics which can further help practitioners fit sophisticated models; notably, how to use simulations to probe the model and priors, and computational techniques to scale-up models based on ordinary differential equations.

I love this workflow stuff!

22 thoughts on “Bayesian workflow for disease transmission modeling in Stan!”

Sam on September 9, 2021 9:48 AM at 9:48 am said:

I had to google what Stan is. In the Wikipedia page, it says that Andrew Gelman is one of the inventors of the programming language. So, Andrew, why use Stan, when there are so many programming languages and so many libraries that, presumably, work similarly?

Reply ↓
- Andrew on September 9, 2021 11:16 AM at 11:16 am said:
  
  Sam:
  
  Stan allows you to write a statistical model and then do Bayesian inference. Stan handles the computation so that the user can focus on the modeling. Perhaps the best place to start is to look at some of the case studies.
  
  Reply ↓
- Daniel Lakeland on September 9, 2021 3:44 PM at 3:44 pm said:
  
  One key reason behind “why Stan” is that you have to understand that the Stan project was started something like 2010 or 2011, and first released in 2012. At the time there were no “programming languages and … libraries that… work similarly”.
  
  in fact much of the reason you have so many “similar” projects these days is BECAUSE they started working on Stan back then (and inspired a lot of things).
  
  To give an example, Turing.jl is a system built entirely within Julia which offers a modeling language and a sampling engine, which gives you very similar functionality. In general it’s maybe not quite as high performance, but it does offer much more functionality, including for example mixed sampling, so you can sample discrete parameters with Gibbs sampling, and continuous ones with Hamiltonian Monte Carlo (using the NUTS algorithm) as well as the full power of the Julia language, including what is undoubtedly the best Differential Equations solving system in the world.
  
  But it probably wouldn’t exist without Stan as the impetus.
  
  Reply ↓
  - Anoneuoid on September 9, 2021 4:26 PM at 4:26 pm said:
    
    There was JAGS.
    
    Reply ↓
    - Daniel Lakeland on September 9, 2021 5:39 PM at 5:39 pm said:
      
      Yes, winBUGS, JAGS and such were very useful, but they didn’t implement a general purpose Turing complete language.
    - Anoneuoid on September 9, 2021 6:34 PM at 6:34 pm said:
      
      I see. But having used both I’m not clear on why that is an advantage. I basically treated them as the same thing with different syntax.
    - gec on September 9, 2021 7:06 PM at 7:06 pm said:
      
      Then you have spared yourself the pleasure of trying to extend WinBUGS with beautiful WBDev scripts or the mental gymnastics of the ones- and zeroes-tricks that valiantly circumvent BUGS syntax to express models beyond the imagination of its developers.
      
      JAGS was/is considerably easier to extend in C++, but is still very limited in its expressive ability as a language.
    - Anoneuoid on September 9, 2021 7:25 PM at 7:25 pm said:
      
      What do you mean by “extend”, writing your own objective function?
    - gec on September 9, 2021 9:05 PM at 9:05 pm said:
      
      Your own distributions, which you could use as objective functions in the context of some models.
    - Daniel Lakeland on September 9, 2021 8:46 PM at 8:46 pm said:
      
      It just depends on the type of models you want to deal with. For example if you want to have ODEs or simulations of agents, or to solve algebraic equations or do some kind of optimization, you can do those within the model in Stan.
    - Anoneuoid on September 9, 2021 9:14 PM at 9:14 pm said:
      
      @ gec too
      
      I didn’t realize that. Yes that is very useful.
    - Bob Carpenter on September 10, 2021 3:40 PM at 3:40 pm said:
      
      You can do all of that in many of the other current systems. Daniel Lee and I built our first ODE solvers just by autodiffing through 4th/5th order Runge-Kutta from Boost. Since then, we’ve implemented proper sensitivities and are currently in the middle of replacing our sensitivities with more efficient adjoint methods. From what I’ve heard, Julia’s now ahead of us in the ODE solver department with Turing.jl—they certainly have a much wider range of solvers implemented.
      
      The really big problem with BUGS and JAGS is the reliance on Gibbs sampling and the reliance on conjugacy for multivariate nodes and the lack of extensibility. BUGS and JAGS still have an advantage in flexibility in allowing discrete sampling. PyMC3 is nice in that it allows discrete and continuous sampling to alternate. The problem is finding problems where discrete sampling will work. It’s much better to marginalize for efficiency when possible. But it’s hard on our users and leads to clunky looking code compared to just having discrete random variables.
    - Daniel Lakeland on September 11, 2021 3:40 PM at 3:40 pm said:
      
      In current systems, yes, but in 2011 or so JAGS and BUGS were available, and doing these things was impossible in those systems. There were a few other general purposes systems as well. Like in R you could use Geyer’s mcmc package. that’s how I did the fitting of ODEs in my dissertation, because Stan didn’t exist and neither did Julia.
    - Daniel Lakeland on September 11, 2021 3:40 PM at 3:40 pm said:
      
      I should say Stan existed but didn’t have ODEs.
    - Anonymous on September 13, 2021 3:38 AM at 3:38 am said:
      
      Are you saying it’d be possible to do optimization as a part of how the model is defined in Stan? I mean something like this:
      
      function minimand(x, somedata){
      …
      }
      
      and then in the model block use this as a part of the model for example like this:
      
      y ~ normal(mu, optimize(minimand, somedata = somedata))
      
      Here, mu would be a parameter but the standard deviation of the normal distribution would be found through minimizing the function defined earlier.
      
      Well, this is a silly example, I couldn’t come up with anything better, but I hope it communicates the idea.
      
      I was trying to find this from Stan’s manual the other day, but I didn’t catch it.
    - somebody on September 9, 2021 9:46 PM at 9:46 pm said:
      
      I was still useless babby no compute how back in the days of BUGS and JAGS, but speculating from what I read, it sounds like both products had a tendency to silently give the wrong answer when you did something unusual. It seems like they were built as something one step up from grad student projects rather than with the principled, production-ready correctness-and-diagnostics-first philosophy of Stan.
    - Andrew on September 9, 2021 10:19 PM at 10:19 pm said:
      
      Somebody:
      
      Bugs and Jags are great. I wrote an article in 2009, Some thoughts on the BUGS package for Bayesian analysis, discussing its strengths and weaknesses. Stan is better, but we never would’ve had Stan without Bugs as a starting point.
    - somebody on September 9, 2021 10:53 PM at 10:53 pm said:
      
      I believe you completely, but playing around with pretty much anything before Stan and even many packages since, most of them feel more like cool proofs of concept or prototypes rather than something people are actually expected to use. At the risk of being mean, but to avoid being vague, the recent stuff I’m thinking about are edward and and especially tensorflow-probability. I honestly can’t imagine using tensorflow-probability to build anything complicated, and for some reason I have very little confidence in its algorithmic correctness either. (Though frankly, even basic tensorflow for the problem it’s designed for is pretty awkward, so it’s not just the probabilistic functionality).
      
      Also, I can’t imagine tackling applied problems and trying to make sure the distributions are conjugate. That just feels like computation contaminating the correctness of the model, which of course it does anyways even with metropolis based methods, but much more constraining. Choosing a prior distribution to make conjugacy work out just feels wrong.
      
      some of my Ph.D. students in political science were using Bugs to
      fit their models. It turned out that Bugs’s modeling language was ideal for students who wanted to fit
      complex models but didn’t have a full grasp of the mathematics of likelihood functions, let alone
      Bayesian inference and integration.
      
      Did bayesian social scientists used to be expected to write their own samplers? I can’t imagine writing a thesis in an entire other field while also having to think about the numerical properties of 4th order runge-kutta and symplectic integrators as a footnote. Of course I guess you did political science and computational statistics, but I assumed that was a pretty unusual case.
    - Bob Carpenter on September 10, 2021 3:00 PM at 3:00 pm said:
      
      @somebody: I’m glad you like Stan. We’ve worked hard on things like error messages, correctness, and clean documentation. They’ve definitely matured over time.
      
      On the Python autodiff side, I’d suggest JAX. It’s much cleaner than TensorFlow. There are some nice tools like Oryx for transforms and TensorFlow Probability can be layered over it. If JAX had existed when we wrote the Stan math library autodiff, we wouldn’t have bothered developing our own autodiff.
      
      Julia’s autodiff is pretty clean, too, which makes embeddings like Turing.jl nice to use.
      
      BUGS doesn’t require conjugate priors in the univariate case, but it’s much more efficient with conjugate priors so you can do exact conditional sampling in Gibbs rather than the adaptive rejection sampler BUGS uses or slice sampler that JAGS used. The advantage of conjugate priors throughout is that models are easy to update with new data online. With non-conjugate priors, you either have to approximate or re-run everything.
      
      Yes, Bayesian social scientists used to write their own samplers. They still do. For example, check out Gill’s Bayesian Methods book. Those were mainly Gibbs or random-walk Metropolis samplers, not HMC. Andrew wrote his own samplers, but he has a Ph.D. in stats. Before we developed Stan, Andrew had a postdoc, Matt Schofield, who wrote an HMC sampler for tree ring data, but again, that was a stats postdoc, not a social scientist.
- Bob Carpenter on September 10, 2021 3:32 PM at 3:32 pm said:
  
  As Daniel Lakeland commented, there wasn’t anything that did what we wanted at the time (2010).
  
  BUGS was decades ahead of its time in the 1990s. But it’s coded in objective Pascal and not easily extensible. JAGS is a more modern reimplementation that replaces delayed rejection sampling with slice sampling for non-conjugate univariate nodes. Matt Hoffman and I started by thinking about how we could make JAGS go faster. Not hard, because it’s implemented using R primitives very wastefully in terms of memory and there wasn’t any vectorized code at the time other than some GLM specializations (JAGS is still improving). The problem is that faster Gibbs isn’t what we need, because it doesn’t scale well in dimension or with correlated posteriors.
  
  I designed the Stan language based on BUGS and JAGS. The initial trick was figuring out how to interpret the sampling statements as log density increments. The second big trick was unconstraining transforms for all of our constrained parameters, so we could use an unconstrained form of Hamiltonian Monte Carlo. I found BUGS and JAGS programs hard to read, so I introduced static typing and separated the data, parameter, and model definitions into blocks for ease of reading. Once we did that, it was clear that we had a full programming language and could have local variables, user-defined functions, etc., and we started playing around with the flexibility that gives us over the way BUGS/JAGS constrain everything to stochastic and deterministic nodes in their graphical model, which is hugely wasteful in memory.
  
  While I was developing the language, Matt Hoffman made the major breakthrough of figuring out how to auto-tune Hamiltonian Monte Carlo (HMC) with the no-U-turn sampler (NUTS). HMC is what makes Stan so much more efficient statistically for most models than BUGS or JAGS and NUTS. Careful coding is what makes it so much tighter in memory (we spend a lot of time profiling and looking at assembly language being generated by our low-level C++ code). If you were to ask me what’s still going to be left of contributions made during the Stan project, I think NUTS and the LKJ prior that Ben Goodrich developed for correlation matrices and the basic approach of ADVI that Alp Kucukelbir developed are going to be with us long after the language is replaced with something better.
  
  The only other system that existed when we started was Autodiff Model Builder (ADMB), but we didn’t find that until much later. ADMB is closer in design to Stan in its basis in autodiff, but it’s more of a scripting design than a full language. ADMB is still very popular in fisheries and wildlife research and is still used by NOAA in production; we’ve helped them upgrade to the no-U-turn sampler.
  
  If something like JAX had existed when we started, we wouldn’t have tried building our own autodiff. We did think we could do better for our use case than the C++ autodiff tools that existed when we started, like Sacado, CppAD, and Adol-C.
  
  Reply ↓
Sam on September 9, 2021 11:41 AM at 11:41 am said:

Thanks.

Reply ↓
Keith O'Rourke on September 9, 2021 12:47 PM at 12:47 pm said:

> verify whether the obtained inference is reliable
Well “verify” does exaggerate the certainty – laundering the real uncertainty.

“Assess” or “examine” would be better choices.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Bayesian workflow for disease transmission modeling in Stan!

22 thoughts on “Bayesian workflow for disease transmission modeling in Stan!”

Leave a Reply Cancel reply