Also, it seems to me that rather than adjust the prior p(a) you could work on adjusting the form of the “simple” distribution so that it resulted in better marginal p(a|y), both p(a) and phi(a) are after all under your control and pure instruments of computation in some sense (it really doesn’t matter what they are, just that they result in good sampling from p(theta|y, a near 1)

]]>We did code up our model in Stan, and you do get this automatic tempering, but you don’t get direct control over the marginal distribution of that parameter a. In the above example, if you just set up the joint distribution, the marginal distribution p(a) varies by dozens of orders of magnitude (see the center lower graph above), hence you need to estimate p(a)—that’s where the path sampling comes in—and adjust for it. That part actually works kind of OK; the real difficulty in general is sampling from the joint distribution p(a,theta); in general, this distribution might have horrible geometry and not be easy to sample from using HMC (and it would be even more difficult to sample from using Gibbs or Metropolis).

]]>Next, I think you could make it so that p(a) the prior on a is flat near a=1 and has some nontrivial mass near a=0 and/or a=1, and it seems like you could just sample this new probability distribution and marginalize to the region a in [0.8, 1.2] and voila you’re done. Why do we need to estimate the normalization constant at all? What am I missing?

]]>