We were having a conversation about differential equation models in pharmacometrics, in particular how to do efficient computation when fitting models for dosing, and Sebastian Weber pointed to this Stancon presentation that included a single-dose model.

Sebastian wrote:

Multiple doses lead to a quick explosion of the Stan codes – so things get a bit involved from the coding side.

– In practical applications I [Sebastian] would by now try to integrate over the dosing events. This allows you to avoid making the initials being vars such that you can a lot of speed. The trick is to make the ODE integrate “see” the dosing events and prevent the integrator to step over it. The cheap hack to do that is to insert a few observation points around the dosing event. This will make the integrator stop at the dosing time and make it realize that there are sudden changes in one of the compartments. This is not really clean, but CVODES handles it well and sundials even documents this technique as one way to handle it.

– I think we should one more time try the adjoint stuff. As we are about to get closures in Stan it should be straightforward to give to the integrator in Stan a log-prob function along with the ODE stuff. Once we have that, then we should be able to create something which is a lot faster than what we have now.

– Last time you tried the adjoint stuff your profiling showed that std::vector did stand out. This is because our AD system is darn slow to get the Jacobian of the ODE RHS. We should instead use an analytic derivative here – not sure how to automatically generate it ATM, but you get 2-3x speedups on the existing ODE system when you do that (and I have code around which does the automatic generation of the symbolic Jacobian…but this code is a well working prototype, not more)… hopefully the OCAML parser can churn these algebraic derivatives out.

– The other thing to explore is sparsity in the ODE RHS. Many ODE systems have a banded structure. One way to get this structural information is by specifying a chemical reaction network. Have a look at the dMod package on CRAN. The cool thing about knowing the sparistiy is that we don’t waste so much on derivative anyway not needed as they are constant and the ODE integrator can work more efficiently during the Newton iterations.

He continued:

Whenever we have long observation times of patients we usually follow a very regular administration pattern. As the dosing compartement often follows a first-order eliminiation scheme you can drastically simplify things. These first order elimination compartements have analytic solutions

andfor regular dosing patterns you can use the geometric series to sum up an arbitrary number of dosings very quickly. As a result you would just use the solution of the dosing compartment as a forcing function to the rest of the ODE system. This way you integrate out one state and you speedup a lot all the calculations.Moreover, in many applications where we have these long observation times, we actually don’t care so much about getting everything right. We often only have data on steady-state and as such you can simplify drastically the models used wrt to the absorbtion process.

What I am saying is that with the use of our brains we can drastically make our life easier here… now, many people still just dump huge dosing histories into these programs and then complain about long running times…

As I have been burned by performance from this I would say that I found lots of good ways of how to avoid the performance killer imposed by ODE stuff + dosing.

This reminds me of something we did many years ago, at a much simpler level, explaining how analytical and simulation methods could be combined to get inferences for small probabilities.

Things have progressed since this blog post was written. I think there’s been progress on the adjoint system and there’s going to be progress soon on avoiding all the packing/unpacking.

If we want people to use the dosing trick for CVODES, we need to explain it to them. I don’t think we can expect people to come up with that from scratch.

As for parser-defined analytic gradients, I doubt that’s going to be available any time soon. It’d require replicating our autodiff system within the parser as a source-transform autodiff system. An option I hope to get in this year is user-specified derivatives for user-defined functions. Would that be sufficient to deal with the performance issue?

Bob, I think user provided derivatives would be great, combine it with Maxima and you’ve got a lot of possibilities. http://maxima.sourceforge.net/

I’ve never heard of Maxima, but it looks interesting. I’ve used Wolfram Alpha a lot to calculate derivatives, but I’ve never figured out how to deal with matrices.

It’s from 1982 and in Common Lisp, which is so cool. I learned Lisp around that time in an AI class, where I wrote a symbolic differentiator for my final class project.

Providing the analytic gradients is step 1. It turns out that we also should avoid the autodiff system altogether during ode solving as it is significant overhead to use autodiff when you provide the Jacobian analytically. This is why I am trying to integrate the ode solver AMICI with Stan. That is a very promising route to get scalable ode solvers in Stan. The AMICI project auto generates all expressions symbolically and comes with a huge number of features (sparsity, adjoint solving, …). For the moment one has to use within chain parallelization if the problem allows for that like in hierarchical models. This can be done with map rect right now and hopefully soon with reduce sum with greater user friendliness as this will avoid the need for packing and unpacking of user data structures needed to define the model.