I used this expression the other day in Lauren’s seminar and she told me she’d never heard it before, which surprised me because I feel like I’ve been saying it for awhile, so I googled *statmodeling bayesian cringe* but nothing showed up! So I guess I should wrote it up.

Eventually everything makes its way from conversation to blog to publication. For example, the earliest appearance I can find of “Cantor’s corner” is here, but I’d been using that phrase for awhile before then, and it ultimately appeared in print (using the original Ascii art!) in this article in a physics journal.

So . . . the Bayesian cringe is this attitude that many Bayesian statisticians, including me, have had, in which we’re embarrassed to use prior information. We bend over backward to assure people that we’re estimating all our hyperparameters from the data alone, we say that Bayesian statistics is the quantification of uncertainty, and we don’t talk much about priors at all except as a mathematical construct. The priors we use are typically structural—not in the sense of “structural equation models,” but in the sense that the priors encode structure about the model rather than particular numerical values. An example is the 8 schools model—actually, everything in chapter 5 of BDA—where we use improper priors on hyperparameters and never assign any numerical or substantive prior information.

The Bayesian cringe comes from the attitude that non-Bayesian methods are the default and that we should only use Bayesian approaches when we have very good reasons—and even that isn’t considered enough sometimes, as discussed in section 3 of this article. So that’s led us to emphasize innocuous aspects of Bayesian inference. Now, don’t get me wrong, I think there are virtues to flat-prior Bayesian inference too. Not always—sometimes the maximum likelihood estimate is better, as in some multidimensional problems where the flat prior is actually very strong (see section 3 of this article) or just because it’s a mistake to take a posterior distribution too seriously if it comes from an unrealistic prior (see section 3 here)—but for the reasons given in BDA, I typically think that flat-prior Bayes is a step forward.

But I keep coming across problems where a little prior information really helps—see for example Section 5.9 here, it’s one of my favorite examples—and more and more I’ve been thinking that it makes sense to just start with a strong prior. Instead of starting with the default flat or super-weak prior, start with a strong prior (as here) and then retreat if you have prior information saying that this prior is too strong, that effects really could be huge or whatever.

As the years have gone on, I’ve become more and more sympathetic with the attitude of Dennis Lindley. As a student I’d read his contributions to discussions in statistics journals and thing, jeez what an extremist, he just says the same damn thing over and over. But now I’m like, yeah, informative priors, cut the crap, let’s go baby. As I wrote in 2009, I suspect I’d agree with Lindley on just about any issue of statistical theory and practice. I’ve read some of Lindley’s old articles and contributions to discussions and, even when he seemed like something of an extremist at the time, in retrospect he always seems to be correct.

One way we’ve moved away from the Bayesian cringe is by using the terminology of regularization. Remember how I said that lasso (and, more recently, deep nets) have made the world safe for regularization? And how I said that Bayesian inference is not radical but conservative (sorry, Lindley)? When we talk about regularization, we’re saying that this kind of partial-pooling-toward-the-prior is desirable in itself. Rather than being a regrettable concession to bias that we accept in order to control our mean squared error, we argue that stability is a goal in itself. (Conservative, you see?)

We’re not completely over the Bayesian cringe—look at just about any regression published in a political science journal, and within econometrics there are still some old-school firebreathers of the anti-Bayesian type—but I think we’re gradually moving toward a general idea that it’s best to use all available information, with informative priors being one way to induce stability and thus allow us to fit more complicated, realistic, and better-predicting models.