Yuling prepared this poster summarizing our recent work on path sampling using a continuous joint distribution. The method is really cool and represents a real advance over what Xiao-Li and I were doing in our 1998 paper. It’s still gonna have problems in high or even moderate dimensions, and ultimately I think we’re gonna need something like adiabatic Monte Carlo, but I think that what Yuling and I are doing is a step forward in that we’re working with HMC and path sampling together, and our algorithm, while not completely automatic, is closer to automatic than other tempering schemes that I’ve seen.

Hi Andrew, if I understand the “Path sampling” idea, it’s that you’re tempering between some kind of “easy” distribution, and the actual distribution of interest p(theta | y) by creating a new parameter a and link function f(a) that is constant when a is near 1.

Next, I think you could make it so that p(a) the prior on a is flat near a=1 and has some nontrivial mass near a=0 and/or a=1, and it seems like you could just sample this new probability distribution and marginalize to the region a in [0.8, 1.2] and voila you’re done. Why do we need to estimate the normalization constant at all? What am I missing?

In fact, couldn’t you code up your model in Stan, and use something like a ~ uniform(0,2) and then just subset the posterior samples to the ones in the appropriate a range, and voila potentially awesome tempering? The bigger issue would then be to do something like adjust the “easy” distribution until you got relatively high effective sample size so that the tempering actually helps.

** nontrivial mass near a=0 and a=2

Daniel:

We did code up our model in Stan, and you do get this automatic tempering, but you don’t get direct control over the marginal distribution of that parameter a. In the above example, if you just set up the joint distribution, the marginal distribution p(a) varies by dozens of orders of magnitude (see the center lower graph above), hence you need to estimate p(a)—that’s where the path sampling comes in—and adjust for it. That part actually works kind of OK; the real difficulty in general is sampling from the joint distribution p(a,theta); in general, this distribution might have horrible geometry and not be easy to sample from using HMC (and it would be even more difficult to sample from using Gibbs or Metropolis).

as long as the integral(p(a|y),a=0.8,1.2) is nontrivial so that you get samples in the region where the thing you’re sampling from is the distribution of interest, why should we care about the marginal p(a|y)? If what you’re saying is that you’re trying to guarantee that you do get a reasonable number of samples in that region by adjusting the prior p(a) so as to force the whole scheme to stay in the region where a takes on the values you need… then yes I see what you mean.

Also, it seems to me that rather than adjust the prior p(a) you could work on adjusting the form of the “simple” distribution so that it resulted in better marginal p(a|y), both p(a) and phi(a) are after all under your control and pure instruments of computation in some sense (it really doesn’t matter what they are, just that they result in good sampling from p(theta|y, a near 1)

One particularly interesting possibility I think is to build the tempered model as essentially a highly simplified version of the likelihood of your real model. Create some model that uses only a few core parameters, and ask it to temper between that model with large prediction/measurement error, and the more detailed model.