Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine.
I have a few reactions to this:
1. I agree with Berger that improper priors can sometimes work OK. You can’t evaluate just part of a model: the prior distribution, data model, and actual data all fit together.
2. I’m not sure who Berger’s “pseudo-Bayesians” are, but I agree that it’s not a good idea to simply use a flat but bounded prior distribution. As I wrote the other day:
I see no particular purity in fitting a model with unconstrained parameter space: to me, it is just as scientifically objective, if not more so, to restrict the space to reasonable values. It often turns out that soft constraints work better than hard constraints, hence the value of continuous and proper priors.
I find the term “weakly informative priors” to be very useful, and I’d like to use this space to plead with Jim Berger to use the term not for bad ideas such as constant densities over bounded spaces but rather for priors that use some general information for the purposes of regularization (“keeping things unridiculous”) in sparse-data settings.
I don’t know if this helps, but when I do a Google Scholar search on weakly informative priors, the top two hits are my own papers, where we indeed get more reasonable and stable estimates than would be obtained using priors that are traditionally considered noninformative.
3. Berger writes that some people have “learned that true subjective priors are fine.” I don’t think that “true subjective priors” are necessarily fine! If this distribution is based on bad information, it might be pretty horrible. Or, more to the point, in any case, how do you know that a subjective prior is actually “true”?
I often do find it helpful or even necessary to include prior information in an analysis, but it would be too much to ask me to supply anything like a “true subjective prior.” The idea of weakly informative priors is to get some (but not all) of the benefit of the prior information while mitigating some (but not all) of the risk of including information that’s not really there.
To ground the discussion slightly, consider pp. 312-313 of my article with David Weakliem on sex ratios. We took real prior information from the scientific literature and we showed how it could be used in a Bayesian or non-Bayesian fashion (the latter using retrospective design analysis to show that, in the problem at hand, any plausible signal would be overwhelmed by noise). For certain purposes of decision analysis, it might have been appropriate for us to come up with our best subjective prior—really, it would be more of a model than a prior distribution, as it would need to account for our understanding of the measurement of beauty as well as the underlying parameter describing the relation of beauty to sex ratio—but for our purpose of understanding the limitations of the data in this particular problem, it was enough to use weak prior information.
In summary . . .
I see two issues here. One is a simple question of terminology. O’Hagan uses the term “weakly informative” for what are commonly considered reference or noninformative priors. Berger considers weakly informative priors to be a particularly bad choice involving flat distributions over a constrained space. I define weakly informative priors as containing real prior information, just less than one might have for any particular problem. Since I actually use these priors in my own applied work, I think I should get to decide the name! In any case, though, it’s probably good to point out the multiple definitions.
Beyond this, I think there’s a statistical point that Berger is missing, which is that there’s something between attempts at noninformativeness (on one hand) and a fully informative prior (on the other). To me, that’s the key idea of weak prior information: as O’Hagan writes, it can often be “difficult to formulate a genuine prior distribution carefully” (or, as I might but it, difficult to set up a full probability model). But at that point we don’t need to retreat to noninformativity; we can take a halfway point and set up a weakly informative prior that includes some, but not all, the substantive information that is available.
I think this is an important concept in Bayesian statistics, which is why I’m speaking on it and writing long blog posts such as this one.