Stan 1.2.0 and RStan 1.2.0

Stan 1.2.0 and RStan 1.2.0 are now available for download. See:

Here are the highlights.

Full Mass Matrix Estimation during Warmup

Yuanjun Gao, a first-year grad student here at Columbia (!), built a regularized mass-matrix estimator. This helps for posteriors with high correlation among parameters and varying scales. We’re still testing this ourselves, so the estimation procedure may change in the future (don’t worry — it satisfies detailed balance as is, but we might be able to make it more computationally efficient in terms of time per effective sample).

It’s not the default option. The major reason is the matrix operations required are expensive, raising the algorithm cost to $latex {\mathcal O}(k m n^2 + n^3 \log m)$, where $latex k$ is the average number of leapfrog steps, $latex m$ is the number of iterations, and $latex n$ is the number of parameters.

Yuanjun did a great job with the Cholesky factorizations and implemented this about as efficiently as is possible. (His homework for Andrew’s class was also the inspiration for the Gaussian process models in the manual.)

It’s integrated with NUTS.

Cumulative Distribution Functions

The practical upshot is that Stan supports more truncated distributions, and hence more truncated and censored data models.

Michael Betancourt did the heavy lifting here, which involved a crazy amount of “special function” derivative calculations and implementations. Everyone knows that the derivative of a distribution function with respect to the variate is the density. But what about the partials with respect to the other parameters? We’ll be documenting all of the functions and derivatives in the manual.

Daniel Lee generalized the entire density and distribution function testing framework to generate code for tests. We’re doing much more extensive tests of the vectorizations and derivatives. Also, Daniel implemented efficient vectorized derivatives for many more of the density functions.

Model Log Probability and Derivatives in R

Jiqiang Guo, who’s at the helm of RStan, wrote code to allow users to access the log probability function in a Stan model and its gradients directly. The functions are parameterized with the unconstrained parameterization of a Stan model with support on all of R^N. He also exposed the model functions to convert back and forth between the constrained and unconstrained parameterizations for initialization and interpretation of the samples.

David Blei suggested that if we added this feature, people could do interesting things in R with it, such as optimization. Let us know if you find it helpful.

Print Posterior Summary Statistics from Command Line

Daniel Lee wrote a program to print a summary of one or more chains from the command line, mirroring the print() command of RStan.

Bug Fixes

We also fixed a bad memory leak in multivariate operations that was introduced in the last release when we optimized the matrix operations for derivative calculations. We also fixed the Windows issue with conservative matrix resizing which caused multivariate models to crash under Windows at optimization levels above 0.

The Future

There hass been a lot of activity in various branches that haven’t been merged into the trunk yet, so stay tuned.

Release Notes
v1.2.0 (6 March 2012)

* full mass matrix estimation during warmup
* expose model log_prob and gradient functions in RStan for use
  in other packages (such as optimizers)
* command-line program to display output from multiple chains
  with parameter-by-parameter mean, se, sd, quantiles, and R-hat
* probability function speed improvements with vectorization
* created Stan contributed repositories for user-contributed
  and experimental features (first entry is an emacs mode)
* modified makefiles so targets are the same under Windows,
  Linux, and Mac

New Functions
* most of the cumulative distribution functions (see the documentation
  index for the full list of supported functions)
* added monitor() function in RStan

Bug Fixes
* disabled Boost asserts in parser to quiet R's warnings
* enabled prints in generated quantities block
* various documentation patches
* fixed memory leak in matrix operations leading to leaks in
  multivariate probability function use
* wrapped call to gradient log prob to catch unexpected exceptions
* fixed matrix resize issue on Windows that caused models to fail
  at optimization levels above 0
* fixed bug in print preventing hyphens or grave accents from
* fixed issue preventing matrix rows from being assigned on the
  left side of an assignment statement
* clearer error messages on matrix and other function arguments

10 thoughts on “Stan 1.2.0 and RStan 1.2.0

  1. Wow. Your release cycle seems really fast. Is there a convention behind the numbering? Are some releases stable and others beta?

    • I’d prefer the release cycle to be a little slower, because releases are work for both us and our users. In this case, the main motivation was that we wanted to fix the memory leak and array access and Windows optimization bugs we introduced in the previous release. It has so many new features because the dev team’s been very busy.

      The convention behind the numbering is the usual one: Major.Minor.Patch. When we just fix bugs or introduce minor new functionality, we increment the patch version. So when we introduce notable new functionality, we increment the minor version. When we release Riemann Manifold HMC, that’s going to be major and we’ll increment the major version number.

      We specifically didn’t want to go down the Zeno version numbering route, where our version numbers increase, but never get to 1. We also didn’t want to get stuck in the Boost or Java rut, where the initial digit never changes.

      All of the releases are intended to be stable. They all pass all of our unit and functional tests. If we were super disciplined, we’d release release candidates, let people test them, then betas, and let people test them, and then finally roll out a real version. As far as we know, nobody has code that depends on our C++, so it isn’t such an issue for us.

  2. I’m really impressed by the release cycle….looks like stan will definitely be my default engine for bayesian inference !!!

    I want also to warmly thank the team for the new emacs mode (stan-mode) !!!
    I have a question concerning this mode, do you plan to integrate it into ESS (Emacs Speak Statistics).?
    In that case would you want it to be subset of ESS BUGS mode which already have Openbugs and Jags ?

  3. Pingback: Andrew Gelman: Stan 1.2.0 and RStan 1.2.0 available | Statistics with R |

    • Please don’t listen to Andrew. “Stan” is not an acronym. And unlike AT&T and IBM, it didn’t start life as an acronym. Stan is named after Stanislaw Ulam, with Andrew riffing on the Eminem song (which I won’t link to, because we want to win the Google juice war!).

      Andrew’s trying to introduce a so-called “backronym”“.

      I beg of you, please don’t use this backronym. And if you hear others using it, I fully support mocking their ignorance (just kidding; one should always be nice to anyone mentioning Stan in conversation).

  4. Does anyone else get an error when they try to install rstan using the instructions here: I’m sure I’m doing something stupid, but on the off chance that there’s a problem…

    I enter:
    >options(repos = c(getOption(“repos”), rstan = “”))
    >install.packages(‘rstan’, type = ‘source’)

    And I get the warning messages:
    Warning in install.packages :
    cannot open: HTTP status was ‘404 Not Found’
    Warning in install.packages :
    cannot open: HTTP status was ‘404 Not Found’
    Warning in install.packages :
    unable to access index for repository
    Warning in install.packages :
    package ‘rstan’ is not available

Comments are closed.