Ryan Giordano wrote:

Last year at StanCon we talked about how you can differentiate under the integral to automatically calculate quantitative hyperparameter robustness for Bayesian posteriors. Since then, I’ve packaged the idea up into an R library that plays nice with Stan. You can install it from this github repo. I’m sure you’ll be pretty busy at StanCon, but I’ll be there presenting a poster about exactly this work, and if you have a moment to chat I’d be very interested to hear what you think!

I’ve started applying this package to some of the Stan examples, and it’s already uncovered some (in my opinion) serious problems, like this one from chapter 13.5 of the ARM book. It’s easy to accidentally make a non-robust model, and I think a tool like this could be very useful to Stan users! As of right now I’m the only one who’s experimented with it (AFAIK), but I think it’s ready for some live guinea pigs.

It’s something I came upon while working on robustness for variational Bayes, but it doesn’t have anything to do with VB. It’s based on a very simple and old idea (nothing more than differentiating under the integral of an expectation). But do I think it’s now timely to re-visit the idea now that we have an MCMC toolkit built with easy-to-use automatic differentiation!

I sent this to some Stan people.

Dan Simpson wrote:

One comment about the lineariseation: it’s basically parameterisations dependent, so it’s not “userproof.” A few friends of mine made a local geometry argument for these types of things in here.

Ryan replied:

This is certainly true! That it’s not userproof is a point that can’t be said enough.

Linearization does, however, has the benefit of being relatively easy for ordinary users to understand. In particular, it’s easy to understand its limitations. The limitations of worst-case priors in a unit ball of a functional metric space are harder to wrap your head around, especially if you’re a Stan user trying to fit a model to your sociology data set.

I might add that, if this package does anything distinctively useful, it’s that it manages to pummel the Stan API into giving it the derivatives it needs. A richer set of features for interacting with the C++ autodiff library through the Stan modeling language would make this package kind of trivial.

And Aki asked:

How do you interpret the robustuness measure? How small value is negligible and how large is too big?

Ryan replied:

A full answer is in appendix C of our our paper. In short, I’d expect users to have in mind a reasonable range of their hyperparameters.

But here’s an answer in short. Also see this picture:

A central assumption—currently only verifiable by actually re-fitting*—is that the expectations depend linearly on the hyperparameters over this range. Given the sensitivity, you can estimate how many posterior standard deviations the expectation would very by across the plausible range of hyperparameters. If the expectation varies too much (e.g., by more than 1 posterior sd), then there’s a problem.

So there are a number of things that the user still has to decide:

1) How much hyperparameter variability is reasonable?

2) Do I have reason to believe that linearity is reasonable over this range?

3) How much variability in the expectation is acceptable?I think the answers to this question will vary considerably from problem to problem, and we’ve made no attempt to answer them automatically (except to normalize automatically by the posterior standard deviation).

* You can calculate higher-order derivatives wrt the hyperparameter, but in general you’ll need higher order derivatives of the log probability which are not now exposed in rstan. I also expect MCMC error will start to become a real problem. I haven’t actually experimented with this.

All this is a followup from our conversation regarding static sensitivity analysis, an idea that I had way back when but have never really integrated into my workflow. I’m hoping that Ryan’s methods will make it easy to do this in Stan, and this will give us one more tool to understand how our models are working.

My first thought is that some importance weighting or importance sampling could help move beyond the linearity approximation.

It’s true that importance sampling doesn’t require linearity. The problem is that importance sampling fails precisely when the new posterior (with changed hyperparameters) doesn’t overlap much with the original posterior, which is exactly the case you need to know about when robustness is a problem.

In contrast, linearity can hold far beyond the mass of the original posterior. Consider, for example, the sensitivity of a multivariate normal parameter to its prior mean — the dependence is linear for all values of the prior mean, even when there’s no overlap at all and importance sampling fails completely.

With exactly your thought in mind, I actually initially implemented importance sampling as a sanity check for linearity in this package, but took it out of the API for exactly the above reasons. It’s just not very useful in cases of real non-robustness. (I’d be happy to put it back in there’s a demonstrable need!)

Incidentally, the linear approximation we describe is also a linear approximation to the importance sampling estimate — see appendix B of our paper, Covariances, Robustness, and Variational Bayes, for a proof: https://arxiv.org/pdf/1709.02536.pdf