And by “original” I mean wrong, now I understand that the typo comes straight from the Kernel Adaptive Metropolis-Hastings paper. They say that “this distribution concentrates around the r0-circle with a periodic perturbation (with amplitude A and frequency ω)” which didn’t make sense… until now.

]]>Thanks. That explains it, you’re using a different density. (By the way, I found some code at https://github.com/yao-yl/path-tempering/tree/master/replication%20code%20for%20paper/Flower but it uses the original flower distribution so I remained confused.)

]]>Carlos, I checked with my coauthor and there was a typo in the density form. The actual density we used in the experiment was exp((…)^2), an additional square part.

]]>Interesting post! Another small typo: you are using “irreverent” when you mean “irrelevant”. Irreverent means “disrespectful,contemptuous, scornful” (purposefully dropping the country’s flag would be an irreverent act). Irrelevant = ¬relevant.

]]>This is what the (unnormalized) density looks like https://imgur.com/a/RELClbV and your “draws from the flower target” are not like that.

]]>The density corresponds to the joint density of x and y.

]]>Can you expand on the “path sampling” idea. I’m not quite sure what you mean by path sampling in this context.

]]>exp(mean(log(1+ month_return)))-1- mean(month_return)

From what I am seeing from the source you wanted to write it as:

exp(mean(log(1+ month_return)))-1- mean(month_return) = 0

]]>Maybe I’m missing something (what parameters are you using?) but the plots in Figure 9 do not seem to correspond to the density function you wrote just before.

]]>Yes, simulated tempering is not a new idea!

]]>p(q)^(1/t)

and t = 1 + (tmax-1) * inv_logit(l)

and l is given a prior normal(m,s)

Let’s unpack that a little… by adjusting m,s we can force l to stay in a certain region… if that region is entirely much less than 0 (like say less than -5) then t = 1 for all l of interest, and we’re sampling from p(q)^1 = p(q). This is the posterior we really want. However this posterior is assumedly multi-modal and has lots of difficult to sample regions, so if we shift and stretch the prior for l so that it spends more time near 0 or above… then t is between 1 and tmax… and we’re sampling from p(q)^(1/t) which is a tempered distribution where the energy gaps between different portions are relatively less severe… if t goes to infinity then p(q)^(1/t) goes to 1 and in that limit the q value can be anything.

In the end, we simply drop all the samples where t was greater than say 1.0001 and these are our sample from p(q)

Now I think Yuling’s idea was a slight variation on this, with instead of just tempering p(q) we do p(q)^(1-t) * b(q)^t and t is now between 0 and 1, and we’re continuously deforming between p(q) and b(q)… which is another good idea, but differs mainly in the details. we’re still ultimately just looking at the samples were t ~ 0 so we’re sampling from p(q)

]]>Phil, I always wish I could be a better writer! I made this post too complicated by trying to draw some deep connection between importance sampling and Taylor series expansion, which used to bother me a lot. And that is why I start by a portfolio return example.

Yes, Gelman and Meng (1998) covers most background of relevant methods.

Annealing and Tempering can often refer to very similar ideas. Depends on whether there is extra metropolis hastings adjustment, they can be used in both optimization (to find the global mode) and sampling (to draw from all modes).

y and -y are just a lazy way to write y=(10, -10).

Yuling,

Thanks, I did understand that part. It’s not a case of tl;dr, it’s ts;di (too stupid, didn’t understand). Maybe I simply need more background knowledge than I have, for example I am very familiar with simulated annealing but had never heard of simulated tempering.

And although I am extremely familiar with multi-modality I can’t think of a case in which I would simulate y and -y as being drawn from the same distribution. Probably this is mathematically equivalent to the way I’m used to thinking about this stuff, but that equivalence is not at all intuitive to me.

Also, I missed how (if at all) the bit about portfolio return is relevant to the rest of the post.

I do have multi-modal models I am interested in fitting but cannot currently fit, so I would like to understand this better, but I need a much more elementary introduction. Can you suggest a place to start? Maybe just Gelman and Meng 1998?

]]>It is a new feature in this path sampling framework. Currently the underlying stan cannot read it, so we use an R interface to first convert this new syntax into a workable stan program. But it can be of general interest to fit multiple model blocks all at once even without computation concerns.

]]>:)

]]>Hi Christian, yes there is always a great need of communications among researchers, especially from different fields. It is unfortunate that people also use different vocabulary and it is hard for google to automatically translate “free energy” into “Bayes factor”.

]]>Bartels, C. & Karplus, M. (1997). Multidimensional Adaptive Umbrella Sampling: Applications to Mainchain and Sidechain Peptide Conformations. J. Comp. Chem. 18, 1450–1462.

…

Bartels, C., Widmer, A. & Ehrhardt, C. (2005). Absolute Free Energies of Binding of Peptide Analogs to the HIV-1 Protease from Molecular Dynamics Simulations. J. Comput. Chem. 26, 1294-1305.

Interesting also to see how these approaches have evolved for a long time in parallel in biophysics and stats.

Cheers

Christian

]]>The tl;nr version is we use adaptive path sampling to compute the log normalizing constant in simulated tempering. This enables a continuous sampler (Stan) to sample from multimodal density.

]]>I’m hoping someone (perhaps Yuling) can write a brief summary that describes: (1) what problem is being solved, and (2) how.

I know that’s what the post itself is supposed to do, but I need a simpler version, if someone is willing to write that.

]]>