Stan Weekly Roundup, 7 July 2017

Holiday weekend, schmoliday weekend.

  • Ben Goodrich and Jonah Gabry shipped RStan 2.16.2 (their numbering is a little beyond base Stan, which is at 2.16.0). This reintroduces error reporting that got lost in the 2.15 refactor, so please upgrade if you want to debug your Stan programs!
  • Joe Haupt translated the JAGS examples in the second edition of John Kruschke’s book Doing Bayesian Data Analysis into Stan. Kruschke blogged it and Haupt has a GitHub page with the Stan programs. I still owe him some comments on the code.
  • Andrew Gelman has been working on the second edition of his and Jennifer Hill’s regression book, which is being rewritten as two linked books and translated to Stan. He’s coordinating with Jonah Gabry and Ben Goodrich on the RStanArm replacements for lme4 and lm/glm in R.
  • Sean Talts got in the pull request for enabling C++11/C++14 in Stan. This is huge for us developers as we have a lot of pent-up demand for C++11 features on the back burner.
  • Michael Betancourt, with feedback from the NumFOCUS advisory board for Stan, put together a web page of guidelines for using the Stan trademarks.
  • Gianluca Baio released version 1.0.5 of survHE, a survival analysis package based on RStan (and INLA and ShinyStan). There’s also the GitHub repo that Jacki Buros Novik made available with a library of survival analysis models in Stan. Techniques from these packages will probably make their way into RStanArm eventually (Andrew’s putting in a survival analysis example in the new regression book).
  • Mitzi Morris finished testing the Besag-York-Mollie model in Stan and it passes the Cook-Gelman-Rubin diagnostics. Given that GeoBUGS gets a different answer, we now think it’s wrong, but those tests haven’t completed running yet (it’s much slower than Stan in terms of effective sample size per unit time if you want to get to convergence).
  • Imad Ali has been working with Mitzi on getting the BYM model into RStanArm.
  • Jonah Gabry taught a one-day Stan class in Padua (Italy) while on vacation. That’s how much we like talking about Stan.
  • Ben Goodrich just gave a Stan talk at the Brussels useR conferece group following close on the heels of his Berlin meetup. You can find a lot of information about upcoming events at our events page.

  • Mitzi Morris and Michael Betancourt will be teaching a one-day Stan course for the Women in Machine Learning meetup event in New York on 22 July 2017 hosted by Viacom. Dan Simpson’s comment on the blog post was priceless.
  • Martin Černý improved feature he wrote to implement a standalone function parser for Stan (to make it easier to expose functions in R and Python).
  • Aki Vehtari arXived a new version of the horseshoe prior paper with a parameter to control regularization more tightly, especially for logistic regression. It has the added benefit of being more robust and removing divergent transitions in the Hamiltonian simulation. Look for that to land in RStanArm soon.
  • Charles Margossian continues to make speed improvements on the Stan models for Torsten and is also working on getting the algebraic equation solver into Stan so we can do fixed points of diff eqs and other fun applications. If you follow the link to the pull request, you can also see my extensive review of the code. It’s not easy to put a big feature like this into Stan, but we provide lots of help.
  • Marco Inacio got in a pull request for definite numerical integration. There are needless to say all sorts of subtle numerical issues swirling around integrating. Marco is working from John Cook’s basic implemnetation of numerical integration and John’s been nice enough to offer it under a BSD license so it would be compatible with Stan.
  • Rayleigh Lei is working on vectorizing all the binary functions and has a branch with the testing framework. This is really hairy template programming, but probably a nice break after his first year of grad school at U. Michigan!
  • Allen Riddell and Ari Hartikainen have been working hard on Windows compatibility for PyStan, which is no walk in the park. Windows has been the bane of our existence since starting this project and if all the world’s applied statisticians switched to Unix (Linux or Mac OS X), we wouldn’t shed a tear.
  • Yajuan Si, Andrew Gelman, Rob Trangucci, and Jonah Gabry have been working on a survey weighting module for RStanArm. Sounds like RStanArm’s quickly becoming the swiss army knife (leatherman?) of Bayesian modeling.
  • Andrew Gelman finished a paper on (issues with) NHST and is wondering about clinical effects that are small by design because they’re being compared to the state of the art treatment as a baseline.
  • My own work on mixed mode tests continues apace. The most recent pull request adds logical operators (and, or, not) to our autodiff library (it’s been in Stan—this is just rounding out the math lib operators directly) and removed 4000 lines of old code (replacing it with 1000 new lines, but that includes doc and three operators in both forward and reverse mode). I’m optimistic that this will eventually be done and we’ll have RHMC and autodiff Laplace approximations.
  • Ben Bales submitted a pull request for appending arrays, which is under review and will be generalized to arbitary Stan array types.
  • Ben Bales also submitted a pull request for the initial vectorization of RNGs. This should make programs doing posterior predictive inference so much cleaner.
  • I wrote a functional spec for standalone generated quantities. This would let us do posterior predictive inference after fitting the model. As you can see, even simple things like this take a lot of work. That spec is conservative on a task-by-task basis, but given the correlations among tasks, probably not so conservative in total.
  • I also patched potentially undefined bools in Stan; who knew that C++ would initialize a bool in a class to values like 127. This followed on from Ben Goodrich filing the issue after some picky R compiler flagged some undefined behavior. Not a bug, but the code’s cleaner now.

A Primer on Bayesian Multilevel Modeling using PyStan

Chris Fonnesbeck contributed our first PyStan case study (I wrote the abstract), in the form of a very nice Jupyter notebook. Daniel Lee and I had the pleasure of seeing him present it live as part of a course we were doing at Vanderbilt last week.

A Primer on Bayesian Multilevel Modeling using PyStan

This case study replicates the analysis of home radon levels using hierarchical models of Lin, Gelman, Price, and Kurtz (1999). It illustrates how to generalize linear regressions to hierarchical models with group-level predictors and how to compare predictive inferences and evaluate model fits. Along the way it shows how to get data into Stan using pandas, how to sample using PyStan, and how to visualize the results using Seaborn.

As an added bonus, if you follow the link to the source repo on GitHub, you’ll find a Gaussian process case study. I haven’t even had time to look at it yet, but if it’s as nice as this radon study, it’ll be well worth checking out.


P.S. If you’re wondering what one of the core PyMC developers was doing writing PyStan examples, it was because he invited us to teach a course on RStan at Vanderbilt to his biostatistics colleagues who didn’t want to learn Python. It was extremely generous of him to put promoting good science ahead of promoting his own software! Part of our class was on teaching Bayesian methods and how to code models in Stan, and Chris offered to do some case studies, which is what Andrew usually does when he’s the third instructor. Chris said he tried RStan, but then bailed and went back to Python where he could use familiar and powerful Python tools like pandas and numpy and seaborn. It’s hard to motivate learning a whole new language and toolchain just to write one example. The benefit to us is that we now have a great PyStan example. Thanks, Chris!