David Kessler, Peter Hoff, and David Dunson write:

Marginally specified priors for nonparametric Bayesian estimation

Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard nonparametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modeling of high-dimensional sparse contingency tables.

This seems very important to me, also I love the idea of Hoff and Dunson on the same paper, sort of like one of those 70’s supergroups.

Which one is Dieter and which one is the Monkeyman?

So it’s nonparametric piling on.

This is basically the maximum entropy problem. The functionals of the high dimensional parameters are the constraints. If you construct the maxent solution subject to those constraints you’ll get a prior for the parameters whose probability mass is spread out as much as possible.

Many people object to the maxent solution for one reason or another, so it’s worth considering what you’re giving up if you use something else. Any other solution will have a smaller entropy, where the entropy is roughly S~ln W for the “size” W of the high probability region. Since the alternate solution has a smaller high probability region, there is the possibility that the true parameters, which are presumably consistent with the functional constraints, lie in a low probability area of this lower entropy prior. If this happens, then using the prior is likely a bad idea since it effectively rules out the true parameters for no reason given in our prior information before you’ve even considered the data.

Of course practical considerations may limit the ability to construct the maxent solution, but you’d still want a prior as close to the maxent solution as practically feasible.

Yes, I was thinking the same thing.

[…] Gelman links to a paper on priors for Bayesian nonparametric statistics (an has a wonderful in joke in the title for those […]

Very interesting. Trying to do something like this was something that had crossed my mind when I read about the [“Robins-Wasserman” paradox](http://normaldeviate.wordpress.com/2012/08/28/robins-and-wasserman-respond-to-a-nobel-prize-winner/); maybe we would put a prior on the functional psi and then specify the prior on theta(x) conditionally on psi. It seemed to me that what was going wrong was that putting a typical prior on theta(x) is unintentionally highly informative with regards to the functional of interest psi. Anyways, *very* excited to read this paper.