Moving cross-validation from a research idea to a routine step in Bayesian data analysis

Posted on June 30, 2022 12:10 PM by Aki Vehtari

This post is by Aki.

Andrew has a Twitter bot @StatRetro tweeting old blog posts. A few weeks ago, the bot tweeted link to a 2004 blog post
Cross-validation for Bayesian multilevel modeling. Here are some quick thoughts now.

Andrew started with a question “What can be done to move cross-validation from a research idea to a routine step in Bayesian data analysis?” and mentions importance-sampling as possibility, but then continues “However, this isn’t a great practical solution since the weights, 1/p(y_i|theta), are unbounded, so the importance-weighted estimate can be unstable.”. We now have Pareto smoothed importance sampling leave-one-out (PSIS-LOO) cross-validation (Vehtari, A., Gelman, A., Gabry, J., 2017) implemented, e.g., in `loo` R package and `ArviZ` Python/Julia package, and they’ve been downloaded millions of times and seem to be routinely used in Bayesian workflow! The benefit of the approach is that in many cases the user doesn’t need to do anything extra or add a few lines to their Stan code, the computation after sampling is really fast, and the method has diagnostic to tell if some other computationally more intensive approach is needed.

Andrew discussed also multilevel models: “When data have a multilevel (hierarchical) structure, it would make sense to cross-validate by leaving out data individually or in clusters, for example, leaving out a student within a school or leaving out an entire school. The two cross-validations test different things.”PSIS-LOO is great for leave-one-student-out, but leaving out an entire school often changes the posterior too much so that even PSIS can’t handle it. It’s still the easiest way to use K-fold-CV in such cases (ie do brute force computation K times, with K possibly smaller than the number of schools). It is possible to use PSIS, but then additional quadrature integration over the parameters for the left put school is needed to get useful results (e.g. Merkel, Furr, and Rabe-Hesketh, 2019). We’re still thinking how to do cross-validation for multilevel models easier and faster.

Andrew didn’t discuss time series or non-factorized models, but we can use PSIS to compute leave-future-out cross-validation for time series models (Bürkner, P.-C., Gabry, J., and Vehtari, A., 2020a) and for multivariate normal and Student-t models we can do one part analytically and rest with PSIS (Bürkner, P.-C., Gabry, J., and Vehtari, A., 2020b).

Andrew mentioned DIC, and we have later analyzed the properties of DIC, WAIC, and leave-one-out cross-validation (Gelman, A., Hwang, J., and Vehtari, A., 2014), and eventually PSIS-LOO has provided to be the most reliable and has the best self-diagnostic (Vehtari, A., Gelman, A., Gabry, J., 2017).

Andrew also mentioned my 2002 paper on cross-validation, so I knew that he was aware of my work, but it still took several years before I had the courage to contact him and propose a research visit. That research visit was great, and I think we can say we (including all co-authors and people writing software) have been able to make some concrete steps to make cross-validation a more routine step.

Although we are advocating a routine use of cross-validation, I want to remind that we are not advocating cross-validation for model selection as a hypothesis testing (see, e.g. this talk, and Gelman et al. 2020). Ideally the modeller includes all the uncertainties in the model, integrates over the uncertainties and makes model checking that the model makes sense. There is no need then to select any model, as the model that in the best way expresses the information available for the modeller and the related uncertainties is all that is needed. However, cross-validation is useful for assessing how good a single model is, model checking (diagnosing misspecification), understanding differences between models, and to speed-up the model building workflow (we can quickly ignore really bad models, and focus on more useful models, see e.g. this talk on Bayesian workflow).

You can find more papers and discussion of cross-validation in CV-FAQ, and stay tuned for more!

1 thought on “Moving cross-validation from a research idea to a routine step in Bayesian data analysis”

jd on July 8, 2022 11:47 AM at 11:47 am said:

Nice! I would like to add how well this software (loo package) is supported with Aki (and others) often taking the time to give really helpful answers to questions on the Stan forum. Much appreciated!

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Moving cross-validation from a research idea to a routine step in Bayesian data analysis

1 thought on “Moving cross-validation from a research idea to a routine step in Bayesian data analysis”

Leave a Reply Cancel reply