The current state of the Stan ecosystem in R

(This post is by Jonah)

Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of Stan users who are releasing their own packages interfacing with rstan or one of our other packages.

Interfaces

rstanarm and brms: Version 2.17.4 of rstanarm and version 2.2.0 of brms were both released to provide compatibility with the new features in loo v2.0.0. Two of the new vignettes for the loo package show how to use it with rstanarm models, and we have also just released a draft of a vignette on how to use loo with brms and rstan for many “non-factorizable” models (i.e., observations not conditionally independent). brms is also now officially supported by the Stan Development Team (welcome Paul!) and there is a new category for it on the Stan Forums.

rstan: The next release of the rstan package (v2.18), is not out yet (we need to get Stan 2.18 out first), but it will include a loo() method for stanfit objects in order to save users a bit of work. Unfortunately, we can’t save you the trouble of having to compute the point-wise log-likelihood in your Stan program though! There will also be some new functions that make it a bit easier to extract HMC/NUTS diagnostics (thanks to a contribution from Martin Modrák).

Visualization

bayesplot: A few weeks ago we released version 1.5.0 of the bayesplot package (mc-stan.org/bayesplot), which also integrates nicely with loo 2.0.0. In particular, the diagnostic plots using the leave-one-out cross-validated probability integral transform (LOO-PIT) from our paper Visualization in Bayesian Workflow (preprint on arXiv, code on GitHub) are easier to make with the latest bayesplot release. Also, TJ Mahr continues to improve the bayesplot experience for ggplot2 users by adding (among other things) more functions that return the data used for plotting in a tidy data frame.

shinystan: Unfortunately, there hasn’t been a shinystan (mc-stan.org/shinystan) release in a while because I’ve been busy with all of these other packages, papers, and various other Stan-related things. We’ll try to get out a release with a few bug fixes soon. (If you’re annoyed by the lack of new features in shinystan recently let me know and I will try to convince you to help me solve that problem!)

(Update: I forgot to mention that despite the lack of shinystan releases, we’ve been working on better introductory materials. To that end, Chelsea Muth, Zita Oravecz, and I recently published an article User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan (view).)

Other tools

loo: We released version 2.0.0, a major update to the loo package (mc-stan.org/loo). See my previous blog post.

projpred: Version 0.8.0 of the projpred package (mc-stan.org/projpred) for projection predictive variable selection for GLMs was also released shortly after the loo update in order to take advantage of the improvements to the Pareto smoothed importance sampling algorithm. projpred can already be used quite easily with rstanarm models and we are working on improving its compatibility with other packages for fitting Stan models.

rstantools: Unrelated to the loo update, we also released version 1.5.0 of the rstantools package (mc-stan.org/rstantools), which provides functions for setting up R packages interfacing with Stan. The major changes in this release are that usethis::create_package() is now called to set up the package (instead of utils::package.skeleton), fewer manual changes to files are required by users after calling rstan_package_skeleton(), and we have a new vignette walking through the process of setting up a package (thanks Stefan Siegert!). Work is being done to keep improving this process, so be on the lookout for more updates soonish.

Stan related R packages from other developers

There are now well over fifty packages on CRAN that depend in some way on one of our R packages mentioned above!  You can find most of them by looking at the “Reverse dependencies” section on the CRAN page for rstan, but that doesn’t count the ones that depend on bayesplot, shinystanloo, etc., but not rstan.

Unfortunately, given the growing number of these packages, we haven’t been able to look at each one of them in detail. For obvious reasons we prioritize giving feedback to developers who reach out to us directly to ask for comments and to those developers who make an effort to our recommendations for developers of R packages interfacing with Stan (included with the rstantools package since its initial release in 2016). If you are developing one of these packages and would like feedback please let us know on the Stan Forums. Our time is limited but we really do make a serious effort to answer every single question asked on the forums (thank you to the many Stan users who also volunteer their time helping on the forums!).

My primary feelings about this trend of developing Stan-based R packages are ones of excitement and gratification. It’s really such an honor to have so many people developing these packages based on all the work we’ve done! There are also a few things I’ve noticed that I hope will change going forward. I’ll wrap up this post by highlighting two of these issues that I hope developers will take seriously:

(1) Unit testing

(2) Naming user-facing functions

The number of these packages that have no unit tests (or very scant testing) is a bit scary. Unit tests won’t catch every possible bug (we have lots of tests for our packages and people still find bugs all the time), but there is really no excuse for not unit testing a package that you want other people to use. If you care enough to do everything required to create your package and get it on CRAN, and if you care about your users, then I think it’s fair to say that you should care enough to write tests for your package. And there’s really no excuse these days with the availability of packages like testthat to make this process easier than it used to be! Can anyone think of a reasonable excuse for not unit testing a package before releasing it to CRAN and expecting people to use it? (Not a rhetorical question. I really am curious given that it seems to be relatively common or at least not uncommon.) I don’t mean to be too negative here. There are also many packages that seem to have strong testing in place! My motivation for bringing up this issue is that it is in the best interest of our users.

Regarding function naming: this isn’t nearly as big of a deal as unit testing, it’s just something I think developers (including myself) of packages in the Stan R ecosystem can do to make the experience better for our users. rstanarm and brms both import the generic functions included with rstantools in order to be able to define methods with consistent names. For example, whether you fit a model with rstanarm or with brms, you can call log_lik() on the fitted model object to get the pointwise log-likelihood (it’s true that we still have a bit left to do to get the names across rstanarm and brms more standardized, but we’re actively working on it). If you are developing a package that fits models using Stan, we hope you will join us in trying to make it as easy as possible for users to navigate the Stan ecosystem in R.

Stan Weekly Roundup, 21 July 2017

It was another productive week in Stan land. The big news is that

  • Jonathan Auerbach, Tim Jones, Susanna Makela, Swupnil Sahai, and Robin Winstanley won first place in a New York City competition for predicting elementary school enrollment. Jonathan told me, “I heard 192 entered, and there were 5 finalists….Of course, we used Stan (RStan specifically). … Thought it might be Stan news worthy.”

    I’d say that’s newsworthy. Jon also provided a link to the “challenge” page, a New York City government sponsored “call for innovations”: Enhancing School Zoning Efforts by Predicting Population Change.

    They took home a US$20K paycheck for their efforts!

    Stan’s seeing quite a lot of use these days among demographers and others looking to predict forward from time series data. Jonathan’s been very active using government data sets (see his StanCon 2017 presentation with Rob Trangucci, Twelve Cities: Does lowering speed limits save pedestrian lives?). Hopefully they’ll share their code—maybe they already have. I really like to see this combination of government data and careful statistical analysis.

In other big news coming up soon,

In other news,

  • Andrew Gelman‘s been driving a lot of rethinking of our interfaces because he’s revising his and Jennifer Hill’s regression book (the revision will be two books). Specifically, he’s thinking a lot about workflow and how we can manage model expansion by going from fixed to modeled parameters. Right now, the process is onerous in that you have to move data variables into the parameters block and keep updating their priors. Andrew wants to be able to do this from the outside, but Michael Betancourt and I both expressed a lot of skepticism in terms of it breaking a lot of our fundamental abstractions (like a Stan program defining the density that’s fit!). More to come on this hot topic. Any ideas on how to manage developing a model would be appreciated. This goes back to the very first thing Matt Hoffman, Michael Malecki and I worked on with Andrew when he hired us before we’d conceived Stan. You’d think we’d have better advice on this after all this time. I’ve seen people do everything from use the C++ preprocessor to write elaborate program generation code in R.

  • Breck Baldwin has been working on governance and we’re converging on a workable model that we’ll share with everyone soon. The goal’s to make the governance clear and less of a smoke-filled room job by those of us who happen to go to lunch after the weekly meetings.

  • Jonah Gabry is taking on the ggplot2-ification of the new regression book and trying to roll everything into a combination of RStanArm and BayesPlot. No word yet if the rest of the tidyverse is to follow. Andrew said, “I’ll see what Jonah comes up with” or something to that effect.

  • Jonah has alos been working on priors for multilevel regression and poststratification with Yajuan Si (former postdoc of Andrew’s, now at U. Wisconsin); the trick is doing somethign reasonable when you have lots of interactions.

  • Ben Goodrich has been working on the next release of RStanArm. It’s beginning to garner a lot of contributions. Remember that the point has been to convert a lot of common point estimation packages to Bayesian versions and supply them with familiar R interfaces. Ben’s particularly been working on survival analysis lately.

  • Our high school student intern (don’t know if I can mention names online—the rest of our developers are adults!) is working on applying the Cook-Gelman-Rubin metric to evaluating various models. We’re doing much more of this method and it needs a less verbose and more descriptive name!

  • Mitzi Morris submitted a pull request for the Stan repo to add line numbers to error messages involving compound declare-and-define statements.
  • A joint effort by Mitzi Morris, Dan Simpson, Imad Ali, and Miguel A Martinez Beneito has led to convergence of models and fits among BUGS, Stan, and INLA for intrinsic conditional autorgression (ICAR) models. Imad’s building the result in RStanArm and has figured out how to extend the loo (leave one out cross-validation) package to deal with spatial models. Look for it in a Stan package near you soon. Mitzi’s working on the case study, which has been updated in the example-models repo.

  • Charles Margossian knocked off a paper on the mixed ODE solver he’s been working on, with a longer paper promised that goes through all the code details. Not sure if that’s on arXiv or not. He’s also been training Bill Gillespie to code in C++, which is great news for the project since Charles has to contend with his first year of grad school next year (whereas Bill’s facing a pleasant retirement of tracking down memory leaks). Charles is also working on getting the algebraic and fixed state solvers integrated into Torsten before the fall.

  • Krzysztof Sakrejda has a new paper out motivating a custom density he wrote in Stan for tracking dealys in diseseas incidence in a countrywide analysis for Thailand. I’m not sure where he put it.

  • Yuling Yao is revising the stacking paper (for a kind of improved model averaging). I believe this is going into the loo package (or is maybe already there). So much going on with Stan I can’t keep up!

  • Yuling also revised the simulated tempering paper, which is some custom code on top of Stan to fit models with limited multimodality. There was some discussion about realistic examples with limited multimodality and we hope to have a suite of them to go along with the paper.

  • Sebastian Weber, Bob Carpenter, and Micahel Betancourt, and Charles Margossian had a long and productive meeting to design the MPI interface. It’s very tricky trying to find an interface that’ll fit into the Stan language and let you ship off data once and reuse it. I think we’re almost there. The design discussion’s on Discourse.

  • Rob Trangucci is finishing the GP paper (after revising the manual chapter) with Dan Simpson, Aki Vehtari, and Michael Betancourt.

I’m sure I’m missing a lot of goings on, especially among people not at our weekly meetings. If you know of things that should be on this list, please let me know.

Stan Weekly Roundup, 7 July 2017

Holiday weekend, schmoliday weekend.

  • Ben Goodrich and Jonah Gabry shipped RStan 2.16.2 (their numbering is a little beyond base Stan, which is at 2.16.0). This reintroduces error reporting that got lost in the 2.15 refactor, so please upgrade if you want to debug your Stan programs!
  • Joe Haupt translated the JAGS examples in the second edition of John Kruschke’s book Doing Bayesian Data Analysis into Stan. Kruschke blogged it and Haupt has a GitHub page with the Stan programs. I still owe him some comments on the code.
  • Andrew Gelman has been working on the second edition of his and Jennifer Hill’s regression book, which is being rewritten as two linked books and translated to Stan. He’s coordinating with Jonah Gabry and Ben Goodrich on the RStanArm replacements for lme4 and lm/glm in R.
  • Sean Talts got in the pull request for enabling C++11/C++14 in Stan. This is huge for us developers as we have a lot of pent-up demand for C++11 features on the back burner.
  • Michael Betancourt, with feedback from the NumFOCUS advisory board for Stan, put together a web page of guidelines for using the Stan trademarks.
  • Gianluca Baio released version 1.0.5 of survHE, a survival analysis package based on RStan (and INLA and ShinyStan). There’s also the GitHub repo that Jacki Buros Novik made available with a library of survival analysis models in Stan. Techniques from these packages will probably make their way into RStanArm eventually (Andrew’s putting in a survival analysis example in the new regression book).
  • Mitzi Morris finished testing the Besag-York-Mollie model in Stan and it passes the Cook-Gelman-Rubin diagnostics. Given that GeoBUGS gets a different answer, we now think it’s wrong, but those tests haven’t completed running yet (it’s much slower than Stan in terms of effective sample size per unit time if you want to get to convergence).
  • Imad Ali has been working with Mitzi on getting the BYM model into RStanArm.
  • Jonah Gabry taught a one-day Stan class in Padua (Italy) while on vacation. That’s how much we like talking about Stan.
  • Ben Goodrich just gave a Stan talk at the Brussels useR conferece group following close on the heels of his Berlin meetup. You can find a lot of information about upcoming events at our events page.

  • Mitzi Morris and Michael Betancourt will be teaching a one-day Stan course for the Women in Machine Learning meetup event in New York on 22 July 2017 hosted by Viacom. Dan Simpson’s comment on the blog post was priceless.
  • Martin Černý improved feature he wrote to implement a standalone function parser for Stan (to make it easier to expose functions in R and Python).
  • Aki Vehtari arXived a new version of the horseshoe prior paper with a parameter to control regularization more tightly, especially for logistic regression. It has the added benefit of being more robust and removing divergent transitions in the Hamiltonian simulation. Look for that to land in RStanArm soon.
  • Charles Margossian continues to make speed improvements on the Stan models for Torsten and is also working on getting the algebraic equation solver into Stan so we can do fixed points of diff eqs and other fun applications. If you follow the link to the pull request, you can also see my extensive review of the code. It’s not easy to put a big feature like this into Stan, but we provide lots of help.
  • Marco Inacio got in a pull request for definite numerical integration. There are needless to say all sorts of subtle numerical issues swirling around integrating. Marco is working from John Cook’s basic implemnetation of numerical integration and John’s been nice enough to offer it under a BSD license so it would be compatible with Stan.
  • Rayleigh Lei is working on vectorizing all the binary functions and has a branch with the testing framework. This is really hairy template programming, but probably a nice break after his first year of grad school at U. Michigan!
  • Allen Riddell and Ari Hartikainen have been working hard on Windows compatibility for PyStan, which is no walk in the park. Windows has been the bane of our existence since starting this project and if all the world’s applied statisticians switched to Unix (Linux or Mac OS X), we wouldn’t shed a tear.
  • Yajuan Si, Andrew Gelman, Rob Trangucci, and Jonah Gabry have been working on a survey weighting module for RStanArm. Sounds like RStanArm’s quickly becoming the swiss army knife (leatherman?) of Bayesian modeling.
  • Andrew Gelman finished a paper on (issues with) NHST and is wondering about clinical effects that are small by design because they’re being compared to the state of the art treatment as a baseline.
  • My own work on mixed mode tests continues apace. The most recent pull request adds logical operators (and, or, not) to our autodiff library (it’s been in Stan—this is just rounding out the math lib operators directly) and removed 4000 lines of old code (replacing it with 1000 new lines, but that includes doc and three operators in both forward and reverse mode). I’m optimistic that this will eventually be done and we’ll have RHMC and autodiff Laplace approximations.
  • Ben Bales submitted a pull request for appending arrays, which is under review and will be generalized to arbitary Stan array types.
  • Ben Bales also submitted a pull request for the initial vectorization of RNGs. This should make programs doing posterior predictive inference so much cleaner.
  • I wrote a functional spec for standalone generated quantities. This would let us do posterior predictive inference after fitting the model. As you can see, even simple things like this take a lot of work. That spec is conservative on a task-by-task basis, but given the correlations among tasks, probably not so conservative in total.
  • I also patched potentially undefined bools in Stan; who knew that C++ would initialize a bool in a class to values like 127. This followed on from Ben Goodrich filing the issue after some picky R compiler flagged some undefined behavior. Not a bug, but the code’s cleaner now.

McElreath’s Statistical Rethinking: A Bayesian Course with Examples in R and Stan

We’re not even halfway through with January, but the new year’s already rung in a new book with lots of Stan content:

This one got a thumbs up from the Stan team members who’ve read it, and Rasmus Bååth has called it “a pedagogical masterpiece.”

The book’s web site has two sample chapters, video tutorials, and the code.

The book is based on McElreath’s R package rethinking, which is available from GitHub with a nice README on the landing page.

If the cover looks familiar, that’s because it’s in the same series as Gelman et al.’s Bayesian Data Analysis.

rstanarm and more!

Ben Goodrich writes:

The rstanarm R package, which has been mentioned several times on stan-users, is now available in binary form on CRAN mirrors (unless you are using an old version of R and / or an old version of OSX). It is an R package that comes with a few precompiled Stan models — which are called by R wrapper functions that have the same syntax as popular model-fitting functions in R such as glm() — and some supporting R functions for working with posterior predictive distributions. The files in its demo/ subdirectory, which can be called via the demo() function, show how you can fit essentially all of the models in Gelman and Hill’s textbook

http://stat.columbia.edu/~gelman/arm/

and rstanarm already offers more (although not strictly a superset of the) functionality in the arm R package.

The rstanarm package can be installed in the usual way with

install.packages(“rstanarm”)

which does not technically require the computer to have a C++ compiler if you on Windows / Mac (unless you want to build it from source, which might provide a slight boost to the execution speed). The vignettes explain in detail how to use each of the model fitting functions in rstanarm. However, the vignettes on the CRAN website

https://cran.r-project.org/web/packages/rstanarm/index.html

do not currently show the generated images, so call browseVignettes(“rstanarm”). The help(“rstarnarm-package”) and help(“priors”) pages are also essential for understanding what rstanarm does and how it works. Briefly, there are several model-fitting functions:

  • stan_lm() and stan_aov(), which just calls stan_lm(), use the same likelihood as lm() and aov() respectively but add regularizing priors on the coefficients
  • stan_polr() uses the same likelihood as MASS::polr() and adds regularizing priors on the coefficients and, indirectly, on the cutpoints. The stan_polr() function can also handle binary outcomes and can do scobit likelihoods.
  • stan_glm() and stan_glm.nb() use the same likelihood(s) as glm() and MASS::glm.nb() and respectively provide a few options for priors
  • stan_lmer(), stan_glmer(), stan_glmer.nb() and stan_gamm4() use the same likelihoods as lme4::lmer(), lme4::glmer(), lme4::glmer.nb(), and gamm4::gamm4() respectively and basically call stan_glm() but add regularizing priors on the covariance matrices that comprise the blocks of the block-diagonal covariance matrix of the group-specific parameters. The stan_[g]lmer() functions accept all the same formulas as lme4::[g]lmer() — and indeed use lme4’s formula parser — and stan_gamm4() accepts all the same formulas as gamm::gamm4(), which can / should include smooth additive terms such as splines

If the objective is merely to obtain and interpret results and one of the model-fitting functions in rstanarm is adequate for your needs, then you should almost always use it. The Stan programs in the rstanarm package are better tested, have incorporated a lot of tricks and reparameterizations to be numerically stable, and have more options than what most Stan users would implement on their own. Also, all the model-fitting functions in rstanarm are integrated with posterior_predict(), pp_check(), and loo(), which are somewhat tedious to implement on your own. Conversely, if you want to learn how to write Stan programs, there is no substitute for practice, but the Stan programs in rstanarm are not particularly well-suited for a beginner to learn from because of all their tricks / reparameterizations / options.

Feel free to file bugs and feature requests at

https://github.com/stan-dev/rstanarm/issues

If you would like to make a pull request to add a model-fitting function to rstanarm, there is a pretty well-established path in the code for how to do that but it is spread out over a bunch of different files. It is probably easier to contribute to rstanarm, but some developers may be interested in distributing their own CRAN packages that come with precompiled Stan programs that are focused on something besides applied regression modeling in the social sciences. The Makefile and cleanup scripts in the rstanarm package show how this can be accomplished (which took weeks to figure out), but it is easiest to get started by calling rstan::rstan_package_skeleton(), which sets up the package structure and copies some stuff from the rstanarm GitHub repository.

On behalf of Jonah who wrote half the code in rstanarm and the rest of the Stan Development Team who wrote the math library and estimation algorithms used by rstanarm, we hope rstanarm is useful to you.

Also, Leon Shernoff pointed us to this post by Wayne Folta, delightfully titled “R Users Will Now Inevitably Become Bayesians,” introducing two new R packages for fitting Stan models:  rstanarm and brms.  Here’s Folta:

There are several reasons why everyone isn’t using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought . . . A second reason is that MCMC sampling . . . can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages: brms and rstanarm. Interestingly, both of these packages are elegant front ends to Stan, via rstan and shinystan. . . . You can install both packages from CRAN . . .

He illustrates with an example:

mm <- stan_glm (mpg ~ ., data=mtcars, prior=normal (0, 8))
mm  #===> Results
stan_glm(formula = mpg ~ ., data = mtcars, prior = normal(0, 
    8))

Estimates:
            Median MAD_SD
(Intercept) 11.7   19.1  
cyl         -0.1    1.1  
disp         0.0    0.0  
hp           0.0    0.0  
drat         0.8    1.7  
wt          -3.7    2.0  
qsec         0.8    0.8  
vs           0.3    2.1  
am           2.5    2.2  
gear         0.7    1.5  
carb        -0.2    0.9  
sigma        2.7    0.4  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 20.1    0.7  

Note the more sparse output, which Gelman promotes. You can get more detail with summary (br), and you can also use shinystan to look at most everything that a Bayesian regression can give you. We can look at the values and CIs of the coefficients with plot (mm), and we can compare posterior sample distributions with the actual distribution with: pp_check (mm, "dist", nreps=30):

Posterior Check

This is all great.  I’m looking forward to never having to use lm, glm, etc. again.  I like being able to put in priors (or, if desired, no priors) as a matter of course, to switch between mle/penalized mle and full Bayes at will, to get simulation-based uncertainty intervals for any quantities of interest, and to be able to build out my models as needed.

Stan 2.9 is Here!

Stan logo

We’re happy to announce that Stan 2.9.0 is fully available(1) for CmdStan, RStan, and PyStan — it should also work for Stan.jl (Julia), MatlabStan, and StataStan. As usual, you can find everything you need on the

The main new features are:

  • R/MATLAB-like slicing of matrices. There’s a new chapter in the user’s guide part of manual up front explaining how it all works (and more in the language reference on the nitty-gritty details). This means you can write foo[xs] where xs is an array of integers and use explicit slicing, as with bar[1:3, 2] and baz[:3, , xs] and so on.
  • Variational inference is available on an experimental basis in RStan and PyStan, and the adaptation has been improved; we still don’t have a good handle on when variational inference will work and when it won’t, so we would strongly advise only using it for rough work and then verifying with MCMC.
  • Better-behaved unit-vector transform; alas, this is broken already due to a dimensionality mismatch and you’ll have to wait for Stan 2.9.1 or Stan 2.10 before the unit_vector type will actually work (it never worked in the past, either—our bad in both the past and now for not having enough tests around it).

We also fixed some minor bugs and cleaned up quite a bit of the code and build process.

We also would like to welcome two new developers: Krzysztof Sakrejda and Aki Vehtari. Aki’s been instrumental in many of our design discussions and Krzysztof’s first major code contribution was sparse matrix multiplication, which leads to our next topic.

We have also released the first version of RStanARM package. The short story on RStanARM is that it’s an MCMC and VB-based replacement for lm() and glm() from core R, and to some extent, lmer() and glmer() from lme4. I believe there’s also a new version of ShinyStan (2.1) available.

We also wrote up a paper on Stan’s reverse-mode automatic differentiation, the cornerstone of the Stan Math Library:

Sincerely,

The Stan Development Team


(1) Apologies to those of you who tried to download and install RStan as it was trickling through the CRAN process. The problem is that the managers of CRAN felt a single RStan package was too large (4MB or so) and forced us to import existing packages and break RStan down (BH for the Boost headers, RcppEigen for the Eigen headers, StanHeaders for the Stan header files, and RStan itself for RStan itself). Alas, they provide no foolproof way to synchronize releases. We can insist on a particular version, but R always tries to download the latest or just fails. In the future, we’ll be more proactive and let people know ahead of time when things are in an unsettled state on CRAN and how to install through GitHub. Thanks for your patience.

ShinyStan v2.0.0

For those of you not familiar with ShinyStan, it is a graphical user interface for exploring Stan models (and more generally MCMC output from any software). For context, here’s the post on this blog first introducing ShinyStan (formerly shinyStan) from earlier this year.

shinystan_images

ShinyStan v2.0.0 released

ShinyStan v2.0.0 is now available on CRAN. This is a major update with a new look and a lot of new features. It also has a new(ish) name: ShinyStan is the app/GUI and shinystan the R package (both had formerly been shinyStan for some reason apparently not important enough for me to remember). Like earlier versions, this version has enhanced functionality for Stan models but is compatible with MCMC output from other software packages too.

You can install the new version from CRAN like any other package:

install.packages("shinystan")

If you prefer a version with a few minor typos fixed you can install from Github using the devtools package:

devtools::install_github("stan-dev/shinystan", build_vignettes = TRUE)

(Note: after installing the new version and checking that it works we recommend removing the old one by running remove.packages(“shinyStan”).)

If you install the package and want to try it out without having to first fit a model you can launch the app using the preloaded demo model:

library(shinystan)
launch_shinystan_demo()

Notes

This update contains a lot of changes, both in terms of new features added, greater UI stability, and an entirely new look. Some release notes can be found on GitHub and there are also some instructions for getting started on the ShinyStan wiki page. Here are two highlights:

  • The new interactive diagnostic plots for Hamiltonian Monte Carlo. In particular, these are designed for models fit with Stan using NUTS (the No-U-Turn Sampler).

    Diagnostics screenshot Diagnostics screenshotshinystan_diagnostics3

  • The deploy_shinystan function, which lets you easily deploy ShinyStan apps for your models to RStudio’s ShinyApps hosting service. Each of your apps (i.e. each of your models) will have a unique URL. To use this feature please also install the shinyapps package: devtools::install_github("rstudio/shinyapps").

The plan is to release a minor update with bug fixes and other minor tweaks in a month or so. So if you find anything we should fix or change (or if you have any other suggestions) we’d appreciate the feedback.

Stan 2.5, now with MATLAB, Julia, and ODEs

stanlogo-main

As usual, you can find everything on the

Drop us a line on the stan-users group if you have problems with installs or questions about Stan or coding particular models.

New Interfaces

We’d like to welcome two new interfaces:

The new interface home pages are linked from the Stan home page.

New Features

The biggest new feature is a differential equation solver (Runge-Kutta from Boost’s odeint with coupled sensitivities). We also added new cbind and rbind functions, is_nan and is_inf functions, a num_elements function, a mechanism to throw exceptions to reject samples with printed messages, and two new distributions, the Frechet and 2-parameter Pareto (both contributed by Alexey Stukalov).

Backward Compatibility

Stan 2.5 is fully backward compatible with earlier 2.x releases and will remain so until Stan 3 (which is not yet designed, much less scheduled).

Revised Manual

In addition to the ODE documentation, there is a new chapter on marginalizing discrete latent parameters with several example models, new sections on regression priors for coefficients and noise scale in ordinary, hierarchical, and multivariate settings, along with new chapters on all the algorithms used by Stan for MCMC sampling, optimization, and diagnosis, with configuration information and advice.

Preview of 2.6 and Beyond

Our plans for major features in the near future include stiff ODE solvers, a general MATLAB/R-style array/matrix/vector indexing and assignment syntax, and uncertainty estimates for penalized maximum likelihood estimates via Laplace approximations with second-order autodiff.

Release Notes

Here are the release notes.


v2.5.0 (20 October 2014)
======================================================================

New Features
----------------------------------------
* ordinary differential equation solver, implemented by coupling
  the user-specified system with its sensitivities (#771)
* add reject() statement for user-defined rejections/exceptions (#458)
* new num_elements() functions that applies to all containers (#1026)
* added is_nan() and is_inf() functions (#592)
* nested reverse-mode autodiff, primarily for ode solver (#1031)
* added get_lp() function to remove any need for bare lp__  (#470)
* new functions cbind() and rbind() like those in R (#787)
* added modulus function in a way tht is consistent with integer
  division across platforms (#577)
* exposed pareto_type_2_rng (#580)
* added Frechet distribution and multi_gp_cholesky distribution 
  (thanks to Alexey Stukalov for both)

Enhancements 
----------------------------------------
* removed Eigen code insertion for numeric traits and replaced
  with order-independent metaprogram (#1065)
* cleaned up error messages to provide clearer error context
  and more informative messages (#640)
* extensive tests for higher order autodiff in densities (#823)
* added context factory 
* deprecated lkj_cov density (#865)
* trying again with informational/rejection message (#223)
* more code moved from interfaces into Stan common libraries,
  including a var_context factory for configuration
* moved example models to own repo (stan-dev/example-models) and
  included as submodule for stan-dev/stan (#314)
* added per-iteration interrupt handler to BFGS optimizer (#768)
* worked around unused function warnings from gcc (#796)
* fixed error messages in vector to array conversion (#579, thanks
  Kevin S. Van Horn)
* fixed gp-fit.stan example to be as efficient as manual 
  version (#782)
* update to Eigen version 3.2.2 (#1087)

Builds
----------------------------------------
* pull out testing into Python script for developers to simplify
  makes
* libstan dependencies handled properly and regenerate
  dependencies, including working around bug in GNU 
  make 3.8.1 (#1058, #1061, #1062)

Bug Fixes
----------------------------------------
* deal with covariant return structure in functions (allows
  data-only variables to alternate with parameter version);  involved
  adding new traits metaprograms promote_scalar and
  promote_scalar_type (#849)
* fixed error message on check_nonzero_size (#1066)
* fix arg config printing after random seed generation (#1049)
* logical conjunction and disjunction operators short circuit (#593)
* clean up parser bug preventing variables starting with reserved
  names (#866)
* fma() function calls underlying platform fma (#667)
* remove upper bound on number of function arguments (#867)
* cleaned up code to remove compiler warnings (#1034)
* share likely() and unlikely() macros to avoid redundancy warnings (#1002)
* complete review of function library for NaN behavior and consistency
  of calls for double and autodiff values, with extensive
  documentation and extensive new unit tests for this and other,
  enhances NaN testing in built-in test functions (several dozen issues 
  in the #800 to #902 range)
* fixing Eigen assert bugs with NO_DEBUG in tests (#904)
* fix to makefile to allow builds in g++ 4.4 (thanks to Ewan Dunbar)
* fix precedence of exponentiation in language (#835)
* allow size zero inputs in data and initialization (#683)

Documentation
----------------------------------------
* new chapter on differential equation solver
* new sections on default priors for regression coefficients and
  scales, including hierarchical and multivariate based on
  full Cholesky parameterization
* new part on algorithms, which chapters on HMC/NUTS, optimization,
  and diagnostics
* new chapter on models with latent discrete parameters
* using latexmk through make for LaTeX compilation
* changed page numbers to beg contiguous throughout so page 
  numbers match PDF viewer page number
* all user-supplied corrections applied from next-manual issue
* section on identifiability with priors, including discussion of K-1
  parameterization of softmax and IRT
* new section on convergence monitoring
* extensive corrections from Andrew Gelman on regression models
  and notation
* added discussion of hurdle model in zero inflation section
* update built-in function doc to clarify several behaviors (#1025)

A Stan is Born

Stan 1.0.0 and RStan 1.0.0

It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan.

What is (R)Stan?

Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors.

RStan is the R interface to Stan.

Stan Home Page

Stan’s home page is: http://mc-stan.org/

It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples.

Peruse the Manual

If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.