Skip to content

Those people who go around telling you not to do posterior predictive checks

I started to post this item on posterior predictive checks and then I realize I already did post it several months ago! Memories (including my own) are short, though, so here it is again:

A researcher writes,

I have made use of the material in Ch. 6 of your Bayesian Data Analysis book to help select among candidate models for inference in risk analysis. In doing so, I have received some criticism from an anonymous reviewer that I don’t quite understand, and was wondering if you have perhaps run into this criticism. Here’s the setting. I have observable events occurring in time, and I need to choose between a homogeneous Poisson process, and a nonhomogeneous Poisson process, in which the rate is a function of time ( e.g., lognlinear model for the rate, which I’ll call lambda).

I could use DIC to select between a model with constant lambda and one where the log of lambda is a linear function of time. However, I decided to try to come up with an approach that would appeal to my frequentist friends, who are more familiar with a chi-square test against the null hypothesis of constant lambda. So, following your approach in Ch. 6, I had WinBUGS compute two posterior distributions. The first, which I call the observed chi-square, subtracts the posterior mean (mu[i] = lambda[i]*t[i]) from each observed value, square this, and divides by the mean. I then add all of these values up, getting a distribution for the total. I then do the same thing, but with draws from the posterior predictive distribution of X. I call this the replicated chi-square statistic.

If my putative model has good predictive validity, it seems that the observed and replicated distributions should have substantial overlap. I called this overlap (calculated with the step funtion in WinBUGS) a “Bayesian p-value.” The model with the larger p-value is a better fit, just like my frequentist friends are used to.

Now to the criticism. An anonymous reviewer suggests this approach is weakened by “using the observed data twice.” Well, yes, I do use the observed data to estimate the posterior distribution of mu, and then I use it again to calculate a statistic. However, I don’t see how this is a problem, in the sense that empirical Bayes is problematic to some because it uses the data first to estimate a prior distribution, then again to update that prior. I am also not interested in “degrees of freedom” in the usual sense associated with MLEs either.

I am tempted to just write this off as a confused reviewer, but I am not an expert in this area, so I thought I would see if I am missing something. I appreciate any light you can shed on this problem.

My thoughts:

1. My first thought is that the safest choice is the nonhomogeneous process since it includes the homogeneous as a special case (in which the variation in the rate has zero variance over time). This can be framed as a modeling problem in which the variance of the rate is an unknown parameter which must be nonnegative. If you have a particular parametric model (e.g., log(rate(t))=a+b*x(t)+epsilon(t), where epsilon(t) has mean 0 and sd sigma), then the homogeneous model is a special case like a=b=sigma=0.

2. From this perspective, I’d rather just estimate a,b,sigma and see to what extent the data are consistent with a=0, b=0, sigma=0.

3. I agree with you that “using the data twice” is not usually such a big deal. It’s a n vs. n-k sort of thing.

4. I’m not thrilled with the approach of picking the model with the larger p-value. There are lots of reasons this might not work so well. I’d prefer (a) fitting the larger model and taking a look at the inferences for a, b, and sigma; and maybe (b) fitting the smaller model and computing a chi-squared statistic to see if this model can be rejected from the data.

Still more . . .

And here’s more on the ever-irritating topic of people who go around telling you not to use posterior predictive checks because they “use the data twice.” Grrrrr… The posterior predictive distribution is p(y.rep|y). It’s fully Bayesan. Period.


  1. Cosma says:

    Andrew, your reply has gone missing!

  2. Ben Bolker says:

    Did your reply get cut off?

  3. Andrew says:

    Sorry; fixed now. It's amusing how I can remain so calm when writing about important political topics, but when people make a statistical error, it just steams me up!

  4. Devin Johnson says:


    I want to thank you for this blog. I'm a Bayesian statistician who works primarily with ecologists. In the past few months I've had 2 reviews (different papers) come back with questions about the "validity" or "coherence" of the predictive p-value. I spent many hours pouring over the arguments in the literature to make sure of how to respond (and figure out how the data is being used "twice" in a posterior distribution). And, trying to find ways to calibrate the p-value. Then one day I settled into a peaceful bliss when it dawned on me that the p-value is what it is, and I just need to interpret it exactly for it is. I think the term "p-value" throws everybody off. They want it to have all the right frequentist properties of the "p-value" they grew up with from intro stats classes. It's funny that they never seem to question my *95%* credible intervals with this much enthusiasm.

    Best wishes,