In a paper to appear in Statistics and Computing, Ying Liu, Tian Zheng, and I write:

Bayesian highest posterior density (HPD) intervals can be estimated directly from simulations via empirical shortest intervals. Unfortunately, these can be noisy (that is, have a high Monte Carlo error). We derive an optimal weighting strategy using bootstrap and quadratic programming to obtain a more computationally stable HPD, or in general, shortest probability interval (Spin). We prove the consistency of our method. Simulation studies on a range of theoretical and real-data examples, some with symmetric and some with asymmetric posterior densities, show that intervals constructed using Spin have better coverage (relative to the posterior distribution) and lower Monte Carlo error than empirical shortest intervals. We implement the new method in an R package (SPIn) so it can be routinely used in post-processing of Bayesian simulations.

This is one of my pet ideas but it took a long time to get it working. I have to admit I’m still not thrilled with the particular method we’re using—it works well on a whole bunch of examples, but the algorithm itself is a bit clunky. I have a strong intuition that there’s a much cleaner version that does just as well while preserving the basic idea, which is to get a stable estimate of the shortest interval at any given probability level (for example, 0.95) given a bunch of posterior simulations. Once we have this cleaner algorithm, we’ll stick it into Stan, as there are lots of examples (starting with the hierarchical variance parameter in the famous 8-schools model) where a highest probability density interval (or, equivalently, shortest probability interval) makes a lot more sense than the usual central interval.

Maybe I’m missing it, but where was this published? (Love the idea and R package btw).

Christopher:

I added the reference and link above.

From the figures in your paper the differences between central vs. HPD / Spin intervals seems mostly very small. Is that a correct assessment?

I haven’t looked at the paper yet but it’s easy to construct examples where they’re large. For example, any bimodal distribution.

I don’t really get the love for HPDs. They’re not invariant to parameterization, so you (generic you) have to argue that the parameterization you’re using is special somehow. For instance, there’s a case to be made that the log transformation is most appropriate for scale parameters, and then the whole “mode on the boundary of the parameter space” thing that seems to be the primary point in favor of HPDs vanishes…

Corey:

I don’t think the central interval makes sense for the group-level variance parameter “tau” in the 8-schools problem, at least not for the version with a uniform prior on tau

I can accept the argument that the uniform prior does not make sense, based on the reasoning that we must really have some prior information that tau cannot be 0.00001 or whatever. But, conditional on using that prior, the central interval doesn’t make sense to me, as the points on the lower end of the interval (near tau=0) are actually all well supported by the data.

This sort of thing comes up when you do Bayes with a noninformative, or weakly informative, prior. In such settings, the likelihood takes on a more important role. For example, in the 8 schools example, you

can’tput a uniform prior on log(tau).So I’m trying to imagine what your objection would be to the HPD for tau on the log scale, and from what you’ve written above, I’m guessing it would be that the uniform prior on tau has an exponential left tail on the log scale, and this is what bounds the left edge of the HPD away from tau = 0. You seem to be saying that in the noninformative or weakly informative setting, what you want is not an inferential summary that reflects the posterior but rather an inferential summary that reflects what the data are indicating through the (possibly integrated) likelihood. Do I understand your position correctly?