## Convergence diagnostics for Markov chain simulation

Pierre Jacob writes regarding convergence diagnostics for Markov chain simulation:

I’ve implemented an example of TV upper bounds for (vanilla) HMC on a model written in Stan, see here and here for a self-contained R script.

Basically, this creates a stan fit object to obtain a target’s pdf and gradient, and then implements a pure R implementation of some of the methods we’re proposing with Niloy for a simple, non-dynamic HMC algorithm. The model is a logistic regression with random effects. The dimension is around 500, coupled HMC chains seem to contract quickly, with meetings occurring exactly in typically less than 1000 iterations. From that, we obtain the proposed TV upper bounds. Of course the target is very simple, I wanted the whole script to run in a few minutes.

Comments can be sent with a deadline two weeks after the discussion meeting on December 11, that is, December 28. If you know of people who might be interested in writing a comment on that paper, please spread the word.

In passing, I’ve also noted a typo in this page in the leapfrog integrator section, if rho is Normal(0, Sigma), then the second step of the integrator should read theta <- theta + epsilon Sigma^{-1} rho. Also, I believe that "mass matrix" usually refers to the covariance of the momentum variable in the literature, e.g. "M" in "MCMC using Hamiltonian dynamics" by Radford Neal, which is contrary to what you write in the section "Auxiliary Momentum Variable".

New research and found a typo: good news all around.

P.S. Relatedly, here’s our recent paper on convergence diagnostics.

1. Both Matt Hoffman and Michael Betancourt independently recommended the Jacob, O’Leary and AtchadÃ© coupling paper to me! I’m hoping the linked code gives me some kind of insight, because the theory in the paper is over my head (I don’t even understand the first assumption about uniformly bounded moments).

This is related to our recent discussion of evaluating parallelization performance on the Stan forums. Michael was saying (either to me or on that thread) that the obstacle to running a thousand chains for 1 draw each without bias is confirming that you really have that one random draw. Apparently, this coupling stuff can help with that.

P.S. I created an issue for our doc repository so we’ll remember to fix the doc. Anyone with a GitHub account can submit an issue there, which has the advantage that it gets attributed through GitHub, won’t get garbled in translation, and won’t be delayed by Andrew’s blog lag.

• Thanks both for spreading the word on this; we hope to have an exciting discussion published along with the paper.

Bob, sorry about the technical aspects of the paper; we did try hard to make this as simple to read as possible, believe it or not. There are some slides here in case it helps: https://www.slideshare.net/pierrejacob/unbiased-mcmc-with-couplings
Our primary practical motivation was indeed parallel computation, as in many machines with possibly slow communication between them.

Without looking at the theory in details we hope it’s at least clear that the proposed algorithms are implementable in some generality, without knowing e.g. the value of “eta” in Assumption 1 or of “delta” in Assumption 2. This is one of the key differences compared to other ways previously proposed to remove the bias from MCMC algorithms. The other key difference is that we can control the decrease of inefficiency that results from removing the bias, by tuning the integers k and m.

I’ll keep that mind regarding submitting bug reports via github, sorry about that.

• I think slide 8 is really great at illustrating the short-chain bias. But I got lost during the key argument on page 12 and 13. Particularly, I don’t know why the last line of 12 follows. Hopefully Andrew will help me work through this and I’ll also work through the code. The code looks pretty straightforward until it gets to the use of pipes from packages I don’t know.