Practical Bayesian model evaluation in Stan and rstanarm using leave-one-out cross-validation

Posted on October 21, 2016 11:51 AM by Aki Vehtari

Our (Aki, Andrew and Jonah) paper Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC was recently published in Statistics and Computing. In the paper we show

why it’s better to use LOO instead of WAIC for model evaluation
how to compute LOO quickly and reliably using the full posterior sample
how Pareto smoothing importance sampling (PSIS) reduces variance of LOO estimate
how Pareto shape diagnostics can be used to indicate when PSIS-LOO fails

PSIS-LOO makes it possible to use automated LOO in practice in rstanarm, which provides a flexible way to use pre-compiled Stan regression models. The estimation using sampling obtains draws from the full posterior and these same draws are used to compute PSIS-LOO estimate with a negligible additional computational cost. PSIS-LOO can fail, but possible failure is reliably detected by Pareto shape diagnostics. If there are high estimated Pareto shape values, the summary of these is reported to a user with suggestions what to do next. In the initial modeling phase the user can ignore the warnings (and get anyway more reliable results than WAIC or DIC). If there are high estimated Pareto shape values, rstanarm offers to rerun the inference only for the problematic leave-one-out folds (in the paper we named this approach PSIS-LOO+). If there are many high values, rstanarm offers to run k-fold-CV. This way the fast predictive performance estimate is always provided and user can decide how much additional computation time is used to get more accurate results. In the future we will add other utility and cost functions such as explained variance, MAE and classification accuracy to provide easier interpretation of the predictive performance.

The above approach can be used also when using Stan via other interfaces than rstanarm, although then the user needs to add a few lines to the usual Stan code. After this PSIS-LOO and diagonstics are easily computed using the available packages for R, Python, and Matlab.

7 thoughts on “Practical Bayesian model evaluation in Stan and rstanarm using leave-one-out cross-validation”

Mike Lawrence on October 22, 2016 12:04 PM at 12:04 pm said:

Neat! I understand how one can use LOO for model comparison, but the paper notes that it can be useful as a posterior predictive check as well. It would be great to see an example of this latter usage. Would you be looking at the distribution of pointwise LOO values? Or maybe adding code in generated quantities that samples new observations given the model and creates a log_lik2 for these simulated samples, permitting you to loo::compare(loo::loo(log_lik),loo:loo(log_lik2))?

Reply ↓
- Aki Vehtari on October 28, 2016 8:04 AM at 8:04 am said:
  
  LOO can be used for marginal predictive checks (see, e.g. Gelfand, 1996, “Model determination using sampling-based methods”, or Bayesian Data Analysis, 3rd ed, pp 152-153). We’ll add an example for this in the near future.
  
  Reply ↓
- Shravan on October 28, 2016 8:41 AM at 8:41 am said:
  
  Here is an application in psycholinguistics by Bruno Nicenboim:
  
  http://www.ling.uni-potsdam.de/~vasishth/pdfs/NicenboimVasishth2016-ModelsofRetrieval.pdf
  
  Reply ↓
  - Aki Vehtari on October 29, 2016 1:51 PM at 1:51 pm said:
    
    Cool!
    
    Reply ↓
Gmcirco on October 22, 2016 2:42 PM at 2:42 pm said:

Interesting. Richard Mcelrath is a big proponent of WAIC in his book “Statistical Rethinking” I’m curious to see how these compare.

Reply ↓
- Aki Vehtari on October 28, 2016 8:17 AM at 8:17 am said:
  
  I was also a big proponent of WAIC before doing the research which lead to this paper. WAIC is significant improvement compared to DIC, Watanabe’s papers are important for Bayesian LOO, but PSIS-LOO is more reliable and easier to diagnose for potential failure. See also results in Vehtari et al (2016) “Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models” http://jmlr.org/papers/v17/14-540.html.
  
  Reply ↓
Sumio Watanabe on October 30, 2016 11:45 PM at 11:45 pm said:

Dear Professor Aki Vehtari,Pareto Smoothing Important Sampling Cross Validation (PSISCV) is a very interesting method to approximate Bayesian cross validation (BCV). Although WAIC is asymptotically equivalent to BCV, it is not an approximating tool of BCV but an estimator of the generalization error. I would like to recommend that you had better compare cross validations and information criteria from the viewpoint of statistical estimation tools for the generalization error. A simple experiment shows that there is a case E|PSISCV-GE| > E|WAIC-GE|, which is shown on my web page. I heard from statisticians that any estimator had better be studied from bias and variance.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Practical Bayesian model evaluation in Stan and rstanarm using leave-one-out cross-validation

7 thoughts on “Practical Bayesian model evaluation in Stan and rstanarm using leave-one-out cross-validation”

Leave a Reply to Aki Vehtari Cancel reply