(This is not a paper we wrote by mistake.)
(This is also not Andrew)
(This is also really a blog about an aspect of the paper, which mostly focusses on issues around visualisation and how visualisation can improve workflow. So you should read it.)
Recently Australians have been living through a predictably ugly debate around marriage equality. But of course, every country that lives through this humiliating experience needs to add it’s own home-grown, jingoistic, and weirdly specific idiocy. This came in the guise of Senator Eric Abetz who claimed that marriage equality would “open the “floodgates” to people rushing to marry the Sydney Harbour Bridge.”
Why is this relevant? Well because there is a frequent and vociferous opposition to procedures that “use the data twice”. This always surprises me, because when I do applied work [which as of yet has not involved a pre-designed RCT] I look at the data so often that I’ve basically married it. In fact, during my last consulting job, the data and I had seven children and settled down in a beautiful house in the country.
We saw a problem
So why is there so much dissonance between people who write about methodological statistics and my own experience doing applied statistics. (I am willing to consider the idea that I could be better at statistics. Because it is hard!)
A big part of the dissonance comes from two sources:
1) If Null Hypothesis Significance Testing is happening in any guise (see Bayes factors for model chocie, for a Bayesian example) then using the data twice violates every aspect of the theoretical assumptions. This is one of the many reasons why NHST is difficult to perform in correctly.
2) If we are trying to build a predictive model (a general task, remembering that a causal model is just a predictive model for a scenario we haven’t seen), then using the data twice can lead to overfitting and massive generalisation error.
Of these two problems, I care deeply about the second (as for the first: I care not a sausage).
So we wrote a paper about it
This is a very long way around to the point that we [Jonah, me, Aki, Michael, and Andrew] wrote a nice paper called Visualization in Bayesian workflow.
The paper is about how visualisation [I’m not American. I spell it correctly.] is a critical part of a Bayesian’s applied workflow.
The critical idea that we’re playing with is that you build up a model piece-by-piece in part by looking at the data and seeing what if can support. In the paper we cover a bunch of ideas about how visualisation is important in this process (from building a model, to checking that your MCMC worked, to doing posterior checks, to doing model comparison).
This workflow inevitably uses the data more than once. But, as we wrote in the final paragraph:
Regardless of our concerns we have about using the data twice, the workflow that we have described in this paper (perhaps without the stringent prior and posterior predictive checks) is common in applied statistics. As academic statisticians, we have a duty to understand the consequences of this workflow and offer concrete suggestions to robustify practice. We firmly believe that theoretical statistics should be the theory of applied statistics, otherwise it’s just the tail trying to wag the dog.
A better use of prior predictive distributions
As we talked about in the last paper, understanding that a marginal likelihood uses the prior predictive distribution is a key to understanding why it’s so fragile when used for model comparison. (Really, we should call marginal likelihoods “leave-everything-out cross validation”)
But that does not mean the prior predictive distribution is useless, it just means it’s not the right thing to look at when doing model comparison. In Section 3, we suggest using the prior predictive distribution as an additional check to monitor how informative your priors are jointly. In particular, we argue that the prior predictive distribution should be “weakly informative” in the same way that priors on individual parameters should be weakly informative.
That is, the data generated using the prior (and the fixed elements of the design, in this case the spatial locations and the covariate values) should cover the range of “reasonable” data values and a little beyond. For this application, we know that the reasonable range is below about 200 micrograms per cubic metre, so actually Figure 4b) suggests we could probably tighten the priors even further (although they are probably fine).
How do you do this in practice? Well I make an R script to make a directory with 100 or so realisations from the prior and flip through them. You could also look at the distributions of relevant summary statistics.
In the end, prior predictive checks and posterior predictive checks are both important in the statistical workflow. But your prior predictive checks make most sense before you do things with your data, and posterior predictive checks make most sense after. This also let’s us avoid computing a marginal likelihood, which we all should know by now is extremely difficult.
Almost all of the plots in the paper were done using Jonah’s package bayesplot or using ggplot2. A few of the plots are actually not available in the current version of bayesplot: unsurprisingly, when you write a research paper about visualisation techniques, it turns out you need to implement new ones!