Bob Carpenter writes:

Here’s what we do and what we recommend everyone else do:

1. code the model as straightforwardly as possible

2. generate fake data

3. make sure the program properly codes the model

4. run the program on real data

5. *If* the model is too slow, optimize *one step at a time* and for each step, go back to (3).

The optimizations can be of either the statistical or the computational variety. Slow iterations can be due to computational statistical problems with parameterization (requiring too many iterations) or due to slow code (each log density and derivative evaluation being too slow).

Many people seem to stop at step 2.

This post is a bit premature in that Sean Talts and crew (Michael Betancourt, Dan Simpson, Andrew Gelman) are about to roll out a revised form of simulation-based calibration (nee Cook-Gelman-Rubin diagnostics; I’ll leave the acronymming to them). That’s one approach to making the calibrations in (3) rigorous. Turns out running it naively (as in the way it says to run it in the Cook et al. paper) provides discretization and size artifacts in visualization that emerge for larger sized runs. Sean et al. figured out that switching to different statistics avoids the problem (I think Dan dropped a hint to use rank instead when he was in town recently). It’s nice working with such a great gang of statisticians!

I like it! However, every time I read things like “optimize *one step at a time*,” I also hear echos of my DOE instructor talking about the perils of “OFAT.” Do designed experiments ever have a role in optimizing Stan models, or is the cost of coding up that many alternatives simply too great?

Lots of flipping back and forth between the steps. But I swear this workflow in Stan has made me a better applied researcher. I make a ton of mistakes. I get frustrated with sorting out where I’m getting divergent transitions (underflow to -Inf again!). But you sort it out after enough trial and error and in the end you understand your model more deeply.

I’d only add that when coming up with custom likelihood function, I’ve found it very useful to use supply fake parameters in the generated quantities step and double-check that my code returns the proper log-likelihood value. This would fall somewhere between 1 and 3. I run lots of print statements to find any coding errors that might result in NaN or -Inf showing up.

Thanks—all good advice.

The print statements were introduced for just this purpose—glad you’re finding them useful. You can use reject statements for future error checking of inputs if you reuse functions.

The point isn’t just understanding with divergences, though understanding the posterior geometry is helpful both statistically (thinking about assumptions of model) and computationally (efficiency and robustness). The main problem is that you can get biased posteriors from Gibbs or Metropolis that weren’t being diagnosed by statistics like R-hat (or by running one chain for four times as long, for that matter). One of the advantages of Stan is the in-sampler warnings for things like divergences.

You can also test functions by exposing the functions themselves in R. With some of the work we’ve had contributed, this is going to get more robust in RStan 2.17 or 2.18 (not sure when it got merged—2.17.x is the last pre-C++11 release).

Types for all the variables really helps with readability and error checking, especially as programs get large. It’s one of those things that makes easy cases a bit harder (or at least more verbose), but makes harder things easier. I though the combination of typing for variables and having to declare data vs. parameters (knowns vs. unknowns) would really annoy people, but overall, it hasn’t been so bad—I think because both make programs easier to read (even if the declaration of data vs. parameters makes them less flexible than BUGS programs in this regard).

I like this! Recently started a new project and used these steps which was helpful to see if the model would blow up. And now I am trying to figure out how to make it run faster. But I still get hung up on the how to optimize. The Stan users are super helpful in pointing out places where I can vectorize but I still am not always sure how to proceed.

Like @Dalton moving over to Stan has made me a better researcher.

There’s a chapter in the manual on arrays vs. vectors and one specifically aimed at optimization. If models don’t match the data (misspecification), they can be faster to mix and get reasonable effective sample sizes than simpler models (on a per log density evaluation) that don’t match the data well. Otherwise, the main goal is to cut down on the size of the expression graph and the complexity of operations going into it (don’t recompute anything, drop constants).

I’ll check it out. Thanks Bob!

6. And smile in 30 second bursts as you get close to finishing; itâ€™ll make you 2.78% more efficient: https://nytimes.com/2017/12/07/well/move/running-smiling-performance.html