Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here).

And Aki pulls out this great quote from Geisser and Eddy (1979):

This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\sigma^2[1 + (1/n)]$, with loss of efficiency $(2n + 2)^{-1}$.

And I’d like to pull out something from the comments, in which Brendon Brewer wrote, “I’m happy to avoid all these XIC things, as they seem like strange approximations to something I can already accomplish with marginal likelihoods.” I’d be happy to avoid “these XIC things” too—writing our paper took a lot of unpleasant work—but I felt the need to do so because comparing models is a real issue in a lot of applied research. I don’t actually do much model comparison myself—my usual approach is to fit the most complicated model that I can handle, and then get frustrated that I can’t do more—but I recognize that others feel the need for predictive comparisons.

And a key point here, which we discuss in chapter 7 of BDA (chapter 6 of the first and second editions) is that marginal likelihoods (also very misleadingly called “evidence”) do *not* in general solve the problem of predictive model comparison.

Out-of-sample prediction error (which is what AIC, DIC, and WAIC are estimating) is not the same as marginal likelihood. Via weak priors, it’s easy to construct models that fit the data well and give accurate predictions but can have marginal likelihoods as low as you want. In many settings it can make sense to compare fitted models using estimated out-of-sample prediction error, while it will not make sense to compare them using marginal likelihoods. The problem is that, with continuous-parameter models, the marginal likelihood can depend strongly on aspects of the prior distribution that have essentially no impact on the posterior distributions of the individual models. This issue is well known, but perhaps not well known enough.

On page 15, in the formula right before “In expectation: AIC and DIC”, I think there is a missing minus sign before the n/2 log(2 \pi).