Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models

Posted on January 25, 2023 9:29 AM by Andrew

Paul, Jonah, and Aki write:

Cross-validation can be used to measure a model’s predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into simple terms, but a lot of important models in temporal and spatial statistics do not have this property or are inefficient or unstable when forced into a factorized form. We derive how to efficiently compute and validate both exact and approximate LOO-CV for any Bayesian non-factorized model with a multivariate normal or Student-t distribution on the outcome values. We demonstrate the method using lagged simultaneously autoregressive (SAR) models as a case study.

Aki’s post from last year, “Moving cross-validation from a research idea to a routine step in Bayesian data analysis,” connects this to the bigger picture.

8 thoughts on “Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models”

Unanon on January 25, 2023 11:33 AM at 11:33 am said:

Tangential to the post: I never understood the benefit to LOOCV compared to 10-fold or bootstrap validation. Training on N-1 datapoints and predicting for hold-out 1 datapoint and repeating across all datapoints seems like a way for models to be overtrained: the original (full) dataset barely changed at all!!! Maybe LOOCV is more useful for small datasets?

Reply ↓
- Brice on January 25, 2023 11:48 AM at 11:48 am said:
  
  LOO is a way to estimate the expected loss for a single new data point, having trained the model on all of the other available data points. I’m not sure what overtrained means, if you have an overfit model that’ll show up in this score like any other, and generally when I have deployed models I don’t want to leave any training data on the table, why would I?
  
  Reply ↓
  - Unanon on January 25, 2023 11:18 PM at 11:18 pm said:
    
    You don’t leave training data on the table by fitting your selected model to your full dataset. This is different than selecting a model that generalizes best, based on cross-validation results.
    
    Reply ↓
- somebody on January 25, 2023 11:56 AM at 11:56 am said:
  
  LOO typically has a lower variance. For estimating OOS performance, it can also lower the bias of having a smaller training set than you will actually use.
  
  There aren’t a whole lot of general results for the behavior of CV estimators.
  
  Reply ↓
  - Unanon on January 25, 2023 11:25 PM at 11:25 pm said:
    
    Somebody–from my understanding, that’s not quite correct. LOOCV has lower bias and greater variance. You can refer to Elements of Statistical Learning for trade-offs associated with LOOCV and K-Fold.
    
    Reply ↓
    - somebody on January 26, 2023 12:48 AM at 12:48 am said:
      
      Elements is flatly wrong in that chapter. I’ve also never understood their argument about a correlated training set.
      
      Here’s a good summary
      
      https://stats.stackexchange.com/questions/61783/bias-and-variance-in-leave-one-out-vs-k-fold-cross-validation
      
      Simulations usually show that variance decreases with K, but there are no general properties with cross validation.
    - Unanon on January 26, 2023 1:06 PM at 1:06 pm said:
      
      Somebody–thank you for your comment. The bias-variance tradeoff as described in Elements is just a little confusing to me, and I just appreciate the dialogue.
- psych defector on January 25, 2023 2:08 PM at 2:08 pm said:
  
  There are computational trick for computing exact LOO-CV without having to refit the model (the classic version uses the hat matrix I think, can’t remember the Bayesian). So it’s a lot more efficient.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models

8 thoughts on “Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models”

Leave a Reply Cancel reply