“Model takes many hours to fit and chains don’t converge”: What to do? My advice on first steps.

The above question came up on the Stan forums, and I replied:

Hi, just to give some generic advice here, I suggest simulating fake data from your model and then fitting the model and seeing if you can recover the parameters. Since it’s taking a long time to run, I suggest just running your 4 parallel chains for 100 warmup and 100 saved iterations and set max treedepth to 5. Just to get things started, cos you don’t want to be waiting for hours every time you debug the model. That’s like what it was like when I took a computer science class in 1977 and we had to write our code on punch cards and then wait hours for it to get run through the computer.

P.S. Commenter Gec elaborates:

In my [the commenter’s] experience, I treat an inefficient model as a sign that I don’t really understand the model. Of course, my lack of understanding might be “shallow” in that I just coded it wrong or made a typo. But typically my lack of understanding runs deeper, in that I don’t understand how parameters trade off with one another, whether they lead to wonky behavior in different ranges of values, etc.

While there is no one route to improving this understanding, some of it can come from finding analytic solutions to simplified/constrained versions of the full model. A lot comes from running simulations, since this gives insight into how the model’s behavior (i.e., patterns of data) relate to its parameter settings. For example, I might discover that a fit is taking a long time because two parameters, even if they are logically distinct, end up trading off with one another. Or that, even if two parameters are in principle identifiable, the particular data being fit doesn’t distinguish them.

It might seem like these model explorations take a long time, and they do! But I think that time is better spent building up this understanding than waiting for fits to finish.

Exactly. Workflow, baby, workflow.

7 thoughts on ““Model takes many hours to fit and chains don’t converge”: What to do? My advice on first steps.

    • I remember having to wait until the next day to find out I did something wrong in FORTRAN, since we could only run out student jobs at night. I guess I’m older and/or went to a poorer state university rather than the Ivy League.

  1. In my experience, I treat an inefficient model as a sign that I don’t really understand the model. Of course, my lack of understanding might be “shallow” in that I just coded it wrong or made a typo. But typically my lack of understanding runs deeper, in that I don’t understand how parameters trade off with one another, whether they lead to wonky behavior in different ranges of values, etc.

    While there is no one route to improving this understanding, some of it can come from finding analytic solutions to simplified/constrained versions of the full model. A lot comes from running simulations, since this gives insight into how the model’s behavior (i.e., patterns of data) relate to its parameter settings. For example, I might discover that a fit is taking a long time because two parameters, even if they are logically distinct, end up trading off with one another. Or that, even if two parameters are in principle identifiable, the particular data being fit doesn’t distinguish them.

    It might seem like these model explorations take a long time, and they do! But I think that time is better spent building up this understanding than waiting for fits to finish.

    • True enough. One of the things I’ve discovered is that in the context of Bayes, it’s often the case that your model can *fit too well* and also that things which work well in a deterministic sense can be terrible in the Bayesian context.

      The issue with fitting too well is that it produces deep wells in potential energy that it’s difficult to get out of… imagine you’re trying to place your golf ball at the bottom of a 7 mile deep oil well surrounded by mile-wide 2000 ft deep craters… it’s easy to get stuck in one of the craters because they’re wide and relatively shallow but still deep enough it’s a lot of work to hike out of them, and the narrow super deep well is hard to find.

      In the context of what works in deterministic settings relative to bayesian, here’s an example: I tried to fit a simple function to a seasonally noisy economic time-series. I know that in the deterministic *interpolation* context, global radial basis functions with wide flat regions converge exponentially with the number of centers. So I created such a thing, and added a seasonal sine wave to account for the dominant annual seasonality, and tried to fit it.

      well it wouldn’t fit for crap. First off, perturbations to *any* of the coefficients changed the *entire* function, so it requires tiny tiny timesteps in the fit, and produced highly correlated posteriors. Second of all it suffered from the “deep well” problem above. It’d find some quite good but clearly not really the right fit, and then never be able to move out of it.

      I fixed the problem by doing two things:
      1) Using compact radial basis functions, so that changing any given coefficient had zero effect on regions of the time-series outside a certain reasonable radius (a year or so)

      2) eliminated the seasonal sine wave, and left it as part of the error term… relying only on the RBF to find the non-periodic component of the timeseries without estimating a precise seasonality coefficient.

      Those two things changed my problem from something that would never ever in a million years fit, to something that within seconds I could get the answer to the question of interest.

      • Thanks for the example!

        I just wanted to second your comment on how different estimation methods often change how you have to think about a model. Though I use Stan frequently these days, it took me a long time to appreciate it. I was used to specifying Bayesian models in a way that was efficient for Gibbs samplers and didn’t appreciate the complexity of the fitness landscape. At first, my Stan models were a total bust, but once I got my head around how Hamiltonian sampling “sees” the parameter space, things became a lot clearer.

        In essence, I learned to try to specify my models in such a way that the landscape reveals rather than obscures properties of the model. How very Zen!

Leave a Reply to Dieter Menne Cancel reply

Your email address will not be published. Required fields are marked *