[image of Schrodinger’s cat, of course]

Stan collaborator Michael Betancourt wrote an article, “The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo,” discussing how various ideas of computational probability moved from physics to statistics.

Three things I wanted to add to Betancourt’s story:

1. My paper with Rubin on R-hat, that measure of mixing for iterative simulation, came in part from my reading of old papers in the computational physics literature, in particular Fosdick (1959), which proposed a multiple-chain approach to monitoring convergence. What we added in our 1992 paper was the within-chain comparison: instead of simply comparing multiple chains to each other, we compared their variance to the within-chain variance. This enabled the diagnostic to be much more automatic.

2. Related to point 1 above: It’s my impression that computational physics is all about hard problems, each of which is a research effort on its own. In contrast, computational statistics often involves relatively easy problems—not so easy that they have closed-form solutions, but easy enough that they can be solved with suitable automatic iterative algorithms. It could be that the physicists didn’t have much need for an automatic convergence diagnostic such as R-hat because they (the physicists) were working so hard on each problem that they already had a sense of the solution and how close they were to it. In statistics, though, there was an immediate need for automatic diagnostics.

3. It’s also my impression that computational physics problems typically centered on computation of the normalizing constant or partition function, Z(theta), or related functions such as (d/dtheta) log(Z). But in Bayesian statistics, we usually aren’t so interested in that, indeed we’re not really looking for expectations at all; rather, we want random draws from the posterior distribution. This changes our function, in part because we’re just trying to get close, we’re not looking for high precision.

I think points 1, 2, and 3 help to explain some of the differences between the simulation literatures in physics and statistics. In short: physicists work on harder problems so they’ve developed fancier algorithms; statisticians work on easier problems and and so have made more advances in automatic methods.

And we can help each other! From one direction, statisticians use physics-developed methods such as Metropolis and HMC; from the other, physicists use Stan to do applied modeling more effectively so that they are less tied to specific conventional choices of modeling.

We do typically use those posterior draws to compute expectations (for parameter estimates, posterior predictive distributions, and event probabilities).

Isnt’ the reason the physicists seem to be worried about Z that they use models like Markov random fields where the normalizing constant Z depends on the parameters?

I haven’t seen the kinds of models physicists work on (largely through the lens of Michael) being much different than other statisticians. They’re just more physically grounded, which helps with formulating priors and interpreting results.

Bob:

In statistical physics there are easy problems and hard problems. The easy problems get solved and we don’t think about them anymore; the hard problems get the attention. The Ising model is a famous hard problem: it’s a discrete model with phase transitions, requiring specialized algorithms to draw from the distribution; it’s a counterexample to the folk theorem of statistical computing.

In statistics, yes, there are some hard problems, but there are lots of easy problems that we have to keep solving over and over again. Linear regression, logistic regression, multilevel models, etc. Sampling from the posterior distributions of these models is not so hard.

In the world of physics you might have one model, one particular model, and spend years on it. In the world of statistics we have lots of little problems and we want to develop general tools to solve them without much user effort.

People even build special computers to run specific simulations. This group has been doing so for over twenty year (I worked with them long, long ago): https://arxiv.org/abs/1310.1032

Andrew, wouldn’t you perhaps say that you left the field of computational statistics and now most of the sampling problems that you encounter in your research are relatively easy problems? One might find your blog entry slightly disrespectful for the active field of research named “computational statistics”…

I encourage readers interested in computational statistics (Monte Carlo, MCMC, SMC, computing Z, etc) to look up the list of presentations at conferences such as MCQMC or MCM’Ski (renamed Bayes Comp), both occurring every two years, or a good subset of JRSS B, JASA, NIPS and ICML papers (among others) to gather some ideas of not-so-easy problems in the field, and of which researchers are at the forefront of computational statistics.

Pierre:

No, I have not left the field of computational statistics!

And no need to take offense: when I say “computational statistics often involves relatively easy problems,” there’s no disrespect intended. Relatively easy problems are important too. And it’s a hard problem to come up with algorithms that will work automatically for large classes of easy problems.

Indeed, but what is offending is that you say “In statistics, we […]” do this and that, but in fact you describe what you, in particular, do, which might not be very representative of Statistics, neither of Bayesian statistics or computational statistics. For instance in this particular blog post you describe estimating normalizing constants as not very interesting in Bayesian statistics. I would point the readers to this workshop

https://warwick.ac.uk/fac/sci/statistics/crism/workshops/estimatingconstants/

hosted by a stats department, not a physics department, less than 2 years ago.

Since your blog is very popular, this type of confusion leads to other researchers in the field having to justify e.g. why we don’t simply use Stan instead of developing new algorithms. This happens to me regularly. Kind of annoying after a while.

Pierre:

Thanks for sharing your perspective. When blogging I offer only my own perspective, and it’s important that we have a comments section where others can offer their views. So it’s very helpful that you are commenting here, and I appreciate it.

In my above post, I wrote, “computational statistics often involves relatively easy problems.” I think this is indeed the case. Or, to put it another way, computational statistics often involves the hard problem of coming up with algorithms that automatically solve large numbers of easy problems. Computational physics often involves the problem of coming up with a single algorithm that, with great effort, solves a small class of hard problems.

But I’m in 100% agreement with you on your main point, which is that researchers work on all sorts of problems. Even if computational statistics often involves the development of automatic methods for relatively easy problems, that should not at all detract from the research being done in hard problems. Indeed, I can only be talking above about general tendencies, as every problem in computational statistical physics can anyway be thought of as a mathematical or statistical problem.

Just by analogy: Suppose someone said to me that 99.9% of applied statistics is the computation of p-values. I could well reply: Sure, but that’s not what I do! And if someone came to me and said that I should just stop doing what I’m doing and compute p-values instead, then I’d be annoyed too.

In my above post I was not trying do disparage anyone from research into the computational of normalizing constants; rather, I was talking about some general differences between different fields, and how the different problems that people work on can help us understand some differences in the literatures.

Ok, cool then. Sorry to be always ranting and thanks for the encouragement to keep on commenting.

Pierre:

I don’t think you were ranting. But even if you were, that’s fine: ranting is what blogging’s all about!

Various Stan developers attend most of these meetings. And I think we’re overall pretty aware of what’s going on. Andrew is working on all kinds of new algorithms for multi-modal posteriors, stochastic marginalization, and variational inference. Michael’s also been very active in pushing the boundaries of HMC.

I think what Andrew’s getting at is that these meetings aren’t really aimed at building software that most statisticians use. Most statisticians are using mainstream pre-written packages like glm() or lme4() or their equivalent in Stata or SAS, or something specialized for a given domain like NONMEM. We’re trying to aim at that kind of audience with packages like RStanArm and for an audience halfway in between that audience and Bayes Comp with something like Stan itself.

When we write grant proposals to work on Stan, we get dinged for not aiming to solve the kinds of problems tackled by papers at NIPS. We try to motivate that we’re fitting more elaborate models robustly (proposing things like internal online diagnostics and posterior predictive checks), but that doesn’t seem to go over too well with the crowd that wants brand new algorithms.

Bob, if you’re interested I recommend the book Free Energy Computation: a Mathematical Perspective

http://www.worldscientific.com/worldscibooks/10.1142/P579

In it, you’ll find various interpretations of the normalizing constant in chemistry, molecular biology and physics (and most importantly, references).

Thanks, but that’s a bit more math and physics than I can easily digest!

Apologies in advance for being a pedant, but …. while the Betancourt article was a very informative historical account it has so many missing words and number/tense errors that it became an unnecessary slog though clearly meant as an engaging introduction to an important topic. Sentences like “The success of these of Markov chain Monte Carlo, however, contributed to its own demise.” kept throwing me off the trail (no doubt because my bulb is considerably dimmer than Betancourt’s). Fixing these admittedly minor errors might help this paper reach, and enrich, a wider audience.

Anyway, to the extent I understand any of this, unless there is some sort of immutable distribution of the distributions that account for variability among living things I suspect that physicists will always have it easier. The hydrogen atom is not (so far as I know) trying to exploit its environment by changing the distributions of its lonely electron’s various orbitals; yet an increasing number of papers are demonstrating the ability of living things to skew, etc. the phenotype of offspring to exploit detected changes in energy, predation, population density, mate availability, etc. in Mom’s environment.

I’m curious where the field will go. In astrostatistics, we are still having issues with improper likelihoods being used, not paying attention to regularity conditions on LRTs, and so on. I personally became really enthusiastic about Z when I learned of MultiNest. “Wow, I can properly compute the odds of one model to another!” But, priors are still tricky in physics, and we run into the case of comparing physical models to empirical ones (power laws vs numerically computed spectra for example). So, I’m currently stuck on how/when to use Z versus predictive checks like DIC and WAIC to compare models.

On another note, it is nice to see hierarchical modeling becoming more popular in astrostatistics. But, for the tough problems, things get really non-linear and GLM structure is not applicable. So non-centered approaches, prior scales, etc becoming really difficult. Hopefully, as the technique spreads through the field, we will see more tutorial-style examples of how to deal with these models.