This post is by Jonah and Aki.

We’re happy to announce the release of v2.0.0 of the **loo** R package for efficient approximate leave-one-out cross-validation (and more). For anyone unfamiliar with the package, the original motivation for its development is in our paper:

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.

Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. (published version, arXiv preprint)

Version 2.0.0 is a major update (release notes) to the package that we’ve been working on for quite some time and in this post we’ll highlight some of the most important improvements. Soon I (Jonah) will follow up with a post about important new developments in our various other R packages.

**New interface, vignettes, and more helper functions to make the package easier to use**

Because of certain improvements to the algorithms and diagnostics (summarized below), the interfaces, i.e., the `loo()`

and `psis()`

functions and the objects they return, also needed some improvement. (Click on the function names in the previous sentence to see their new documentation pages.) Other related packages in the Stan R ecosystem (e.g., **rstanarm**, **brms**, **bayesplot**, **projpred**) have also been updated to integrate seamlessly with **loo** v2.0.0. (Apologies to anyone who happened to install the update during the short window between the **loo** release and when the compatible rstanarm/brms binaries became available on CRAN.)

Three vignettes now come with the **loo** package package and are also available (and more nicely formatted) online at mc-stan.org/loo/articles:

*Using the loo package (version >= 2.0.0)*(view)*Bayesian Stacking and Pseudo-BMA weights using the loo package*(view)*Writing Stan programs for use with the loo package*(view)

A vignette about K-fold cross-validation using new K-fold helper functions will be included in a subsequent update. Since the last release of **loo** we have also written a paper, Visualization in Bayesian workflow, that includes several visualizations based on computations from **loo**.

**Improvements to the PSIS algorithm, effective sample sizes and MC errors**

The approximate leave-one-out cross-validation performed by the **loo** package depends on Pareto smoothed importance sampling (PSIS). In **loo** v2.0.0, the PSIS algorithm (`psis()`

function) corresponds to the algorithm in the most recent update to our PSIS paper, including adapting the Pareto fit with respect to the effective sample size and using a weakly informative prior to reduce the variance for small effective sample sizes. (I believe we’ll be updating the paper again with some proofs from new coauthors.)

For users of the **loo** package for PSIS-LOO cross-validation and not just the PSIS algorithm for importance sampling, an even more important update is that the latest version of the same PSIS paper referenced above describes how to compute the effective sample size estimate and Monte Carlo error for the PSIS estimate of `elpd_loo`

(expected log predictive density for new data). Thus, in addition to the Pareto k diagnostic (an indicator of convergence rate – see paper) already available in previous **loo** versions, we now also report an effective sample size that takes into account both the MCMC efficiency and the importance sampling efficiency. Here’s an example of what the diagnostic output table from **loo** v2.0.0 looks like (the particular intervals chosen for binning are explained in the papers and also the package documentation) for the diagnostics:

Pareto k diagnostic values: Count Pct. Min. n_eff (-Inf, 0.5] (good) 240 91.6% 205 (0.5, 0.7] (ok) 7 2.7% 48 (0.7, 1] (bad) 8 3.1% 7 (1, Inf) (very bad) 7 2.7% 1

We also compute and report the Monte Carlo SE of `elpd_loo`

to give an estimate of the accuracy. If some k>1 (which means the PSIS-LOO approximation is not reliable, as in the example above) NA will be reported for the Monte Carlo SE. We hope that showing the relationship between the k diagnostic, effective sample size, and and MCSE of `elpd_loo`

will make it easier to interpret the diagnostics than in previous versions of **loo** that only reported the k diagnostic.** **This particular example is taken from one of the new vignettes, which uses it as part of a comparison of unstable and stable PSIS-LOO behavior.

**Weights for model averaging: Bayesian stacking, pseudo-BMA and pseudo-BMA+**

Another major addition is the `loo_model_weights()`

function, which, thanks to the contributions of Yuling Yao, can be used to compute weights for model averaging or selection. `loo_model_weights()`

provides a user friendly interface to the new `stacking_weights()`

and `pseudobma_weights()`

, which are implementations of the methods from Using stacking to average Bayesian predictive distributions (Yao et al., 2018). As shown in the paper, Bayesian stacking (the default for `loo_model_weights()`

) provides better model averaging performance than “Akaike style“ weights, however, the **loo **package does also include Pseudo-BMA weights (PSIS-LOO based “Akaike style“ weights) and Pseudo-BMA+ weights, which are similar to Pseudo-BMA weights but use a so-called Bayesian bootstrap procedure to better account for the uncertainties. We recommend the Pseudo-BMA+ method instead of, for example, WAIC weights, although we prefer the stacking method to both. In addition to the Yao et al. paper, the new vignette about computing model weights demonstrates some of the motivation for our preference for stacking when appropriate.

**Give it a try**

You can install **loo** v2.0.0 from CRAN with `install.packages("loo")`

. Additionally, reinstalling an interface that provides **loo** functionality (e.g., **rstanarm**,** ****brms**) will automatically update your **loo** installation. The **loo** website with online documentation is mc-stan.org/loo and you can report a bug or request a feature on GitHub.

Awesome!

Thought this was a delayed April fools joke, but looks like a useful package.

It will be documented in the forthcoming “loo paper”, to the amusement of British readers.

When I do a google image search for “loo package” the results are a mix of figures from our papers and toilets:

https://www.amazon.co.uk/Wellness-Hung-Loo-Package-Pack-Free-Wall-Hanging-Badkeramik/dp/B00VVNU3X4

This sounds fantastic, can’t wait to work through the vignettes. This whole suite of R packages built upon Stan is really having a big impact on the way I work, so thanks!

I saw a post from Aki a while ago on projection predictive model selection (with Piironen, I recall). Is that in this release, or is it planned to come to loo?

Glad to hear that!

And yes, there’s the projpred package, which is being developed at https://github.com/stan-dev/projpred and uses loo for some backend computations. I forgot to mention it, but projpred was also recently been updated with compatibility with this new release. I will update the post to mention that.

projpred is also in CRAN https://cran.r-project.org/package=projpred and there are several demos in https://github.com/avehtari/modelselection_tutorial Some of these demos will eventually be transformed to proper vignettes and case-studies.

If I haven’t done it before then, when you’re here in NYC soon remind me to setup a web page for projpred. It should have one like we have for the other R packages.

Just out of curiosity, will your (Aki’s) python PSIS-LOO code be updated with the new algorithm(s) and/or diagnostics?

Python PSIS-LOO code https://github.com/avehtari/PSIS/blob/master/py/psis.py was updated with the new algorithm 6 months ago. It was much easier to update just a simple stand-alone function, and only PSIS algorithm part was updated, ie it’s not computing effective sample size and MCSE. loo 2 did long time to release, as the major changes required coordination with other packages using it.

Excellent! Thanks! I think I’ve already been using the updated version, then.