(This is not by Andrew)
This is a paper we (Gelman, Simpson, Betancourt) wrote by mistake.
The paper in question, recently arXiv’d, is called “The prior can generally only be understood in the context of the likelihood”.
How the sausage was made
Now, to be very clear (and because I’ve been told since I moved to North America that you are supposed to explicitly say these things rather than just work on the assumption that everyone understands that there’s no way we’d let something sub-standard be seen by the public) this paper turned out very well. But it started with an email where Andrew said “I’ve been invited to submit a paper about priors to a special issue of a journal, are you both interested?”.
Why did we get this email? Well Mike and I [Simpson], along with a few others, have been working with Andrew on a paper about weakly informative priors that has been stuck in the tall grass for a little while. And when I say it’s been stuck in the tall grass, I mean that I [Simpson] got mesmerised by the complexity and ended up stuck. This paper has gotten us out of the grass. I’d use a saying of my people (“you’ve got to suffer through Henry Street to make it to People”) except this paper is not Henry Street. This paper is good. (This paper is also not People, so watch this space…)
So over a fairly long email thread, we worked out that we were interested and we carved out a narrative and committed to the idea that writing this paper shouldn’t be a trauma. Afterwards, it turned out that individually we’d each understood the end of that conversation differently, leading in essence to three parallel universes that we were each playing in (eat your heart out Sliders).
Long story short, Andrew went on holidays and one day emailed us a draft of the short paper he had thought we were writing. I then took it and wrestled it into a draft of the short paper I thought we were writing. Mike then took it and wrestled it into the draft of the short paper he thought we were writing. And so on and so forth. At some point we converged on something that (mostly) unified our perspectives and all of a sudden, this “low stakes” paper turned into something that we all really wanted to say.
Connecting the prior and the likelihood
So what is this paper about? Well it’s 13 pages, you can read it. But it covers a few big points:
1) If you believe that priors are not important to Bayesian analysis, we have a bridge we can sell you. This is particularly true for complex models, where the structure of the posterior may lead to certain aspects of the prior never washing away with more data.
2) Just because you have a probability distribution, doesn’t mean you have a prior. A prior connects with a likelihood to make a *generative model* for new data and when we understand it in that context, weakly informative priors become natural.
3) This idea is understood by a lot of the classical literature on prior specification like reference priors etc. These methods typically use some sort of asymptotic argument remove the effect of the specific realisation of the likelihood that is observed. The resulting prior then leans heavily on the assumption that this asymptotic argument is valid for the data that is actually being observed, which often does not hold. When the data are far from asymptopia, the resulting priors are too diffuse and can lead to nonsensical estimates.
Generative models are the key
4) The interpretation of the prior as a distribution that couples with the likelihood to build a generative model for new data is implicit in the definition of the marginal likelihood, which is just the density of this generative distribution evaluated at the observed data. This makes it easy to understand why improper priors cannot be used for Bayes factors (ratios of marginal likelihoods): they do not produce generative models. (In a paper that will come out later this week, we make a pretty good suggestion for a better use of the prior predictive.)
5) Understanding what a generative model means also makes it clear why any decision (model choice or model averaging) that involves the marginal likelihood leans very heavily on the prior that has been chosen. If your data is y and the likelihood is p(y | theta), then the generative model makes new data as follows:
– Draw theta ~ p(theta) from the prior
– Draw y ~ p(y | theta).
So if, for example, p(theta) has very heavy tails (like a half-Cauchy prior on a standard deviation), then occasionally the data will be drawn with an extreme value of theta.
This means that the entire prior will be used for making these decisions, even if it corresponds to silly parts of the parameters space. This is why we strongly advocate using posterior predictive distributions for model comparison (LOO scores) or model averaging (predictive stacking).
So when can you use safely diffuse priors? (Hint: not often)
6) Enjoying, as I do, giving slightly ludicrous talks, I recently gave one called “you can only be uninformative if you’re also being unambitious”. This is an under-appreiciated point about Bayesian models for complex data: the directions that we are “vague” in are the directions where we are assuming the data is so strong that that aspect of the model will be unambiguously resolved. This is a huge assumption and one that should be criticised.
So turn around bright eyes. You really need to think about your prior!
PS. If anyone is wondering where the first two sentences of the post went, they weren’t particularly important and I decided that they weren’t particularly well suited to this forum.