Mark Girolami points us to this paper and software (with Oksana Chkrebtii, David Campbell, and Ben Calderhead). They write:

We develop a general methodology for the probabilistic integration of differential equations via model based updating of a joint prior measure on the space of functions and their temporal and spatial derivatives. This results in a posterior measure over functions reflecting how well they satisfy the system of differential equations and corresponding initial and boundary values. We show how this posterior measure can be naturally incorporated within the Kennedy and O’Hagan framework for uncertainty quantification and provides a fully Bayesian approach to model calibration. . . . A broad variety of examples are provided to illustrate the potential of this framework for characterising discretization uncertainty, including initial value, delay, and boundary value differential equations, as well as partial differential equations. We also demonstrate our methodology on a large scale system, by modeling discretization uncertainty in the solution of the Navier-Stokes equations of fluid flow, reduced to over 16,000 coupled and stiff ordinary differential equations.

This looks interesting and potentially very important. In the world of applied math, the problem is sometimes called “uncertainty quantification” or UQ; in statistics we call it Bayesian inference. In any case it can be difficult because for big problems these differential equations can take a long time to (numerically) solve. So there would be a lot of use for a method that could take random samples from the posterior distribution, to quantify uncertainty without requiring too many applications of the differential equation solver.

One challenge is that when you run a differential equation solver, you choose a level of discretization. Too fine a discretization and it runs too slow; too coarse and you’re not actually evaluating the model you’re interested in. I’ve only looked quickly at this new paper of Chkrebtii et al., but it appears that they are explicitly modeling this discretization error rather than following the usual strategy of applying a very fine grid and then assuming the error is zero. The idea, I assume, is that if you model the error you can use a much coarser grid and still get good results. This seems possible given that do apply any of these methods you need to apply the solver many many times.

As the above image from their paper illustrates, each iteration of the algorithm proceeds by running a stochastic version of the differential-equation solver. As they emphasize, they do *not* simply repeatedly run the solver as is; rather they go inside the solver and make it stochastic as a way to introduce uncertainty into each step. I haven’t read through the paper more than this but it looks very interesting and it feels right in the sense that they’ve added some items to the procedure so it’s not like they’re trying to get something for nothing.

Also, movies:

In practice, it is pretty simple to simulate a DiffEQ model with one step size and then cut the step size in half and see if the results change. If they do, your simulation grid is too coarse; if they don’t, then you can probably use the coarser step-size without worry.

Not extremely rigorous, but very effective and practical as a rule of thumb to determine the an accurate simulation step size.

Lot’s of more advance solvers will dynamically adjust the step size as your simulation progress based on current rates of change in the model (e.g. periods of high change will use a small step size, periods of low change will use a coarser step size) in order to obtain some desired maximum level of error.

Scott:

Yes, that makes sense. I think the idea of the paper discussed above is that, if you’re trying to quantify uncertainty, you can go a lot further by incorporating the uncertainty into the solver itself, rather than following the traditional path of just varying the inputs and running the deterministic solver a zillion times.

There’s an “easier” way to do this, where you add to the observation equation a structured random effect that tried to capture the difference between the true dynamics and the discrete dynamics. And, it works pretty well for lots of problems.

But when the dynamics of the system are unstable (there’s a chaotic example in the paper and another one on the website), the evolving discrepancy is hard to capture and the solution in this paper starts moving towards “stuff that maybe wasn’t possible before”.

And, let’s face it, this is a sexier solution that calling SUNDIALS.

(There are also advantages to removing the natural bias in numerical methods. Look at, for example, quasi-Monte Carlo vs randomised QMC)

Trying to add a little bit to the discussion, could you please consider looking at this manuscript?

http://arxiv.org/abs/1311.2281

Marcos, a very nice point you make. Typically, in the context of modeling error, input uncertainty, and measurement noise, taking a very small step size is a total and utter waste of time.

One thing that I think holds people back in appreciating these things is that many applied mathematicians take the ODE/PDE as GIVEN and CORRECT. The truth is, this forward map is just a statistical approximation to begin with. We KNOW before we even run the solver that fluids do not obey the Navier Stokes equations in any form. They obey something much closer to Newtonian Molecular Dynamics, and if you like to get really pedantic, Schrodinger’s equation or even some relativistic version.

So, taking the ODE/PDE as the *correct* model which needs to be solved accurately is already the wrong perspective. The right perspective is to say that these ODE/PDE models are themselves statistical models which we should expect to have errors. Some people want to add independent identical error increments, resulting in Stochastic DE models, but this is also wrong. Errors in the model are not independent of each other at nearby time points.

As long as you’re making all these other errors, it’s insane to demand that discretization and integration error be many orders of magnitude smaller than these other errors. Especially considering the CPU cost.

You’re right about model misspecification, but that’s always true. The only way around it is to make your assumptions, ride them as hard as you can and then perform post analysis sanity checks. All this is much easier if you’re only dealing with one source of error (the model) rather than two (model + discretisation).

But it does make me wonder if there’s scope to treat ODEs in an “expert elicitation” framework by converting the ODE into a prior over the space of dynamical systems (or, I guess, the space of Markovian mappings). Chris Holmes and James Watson did a “post hoc” version of this in a different context, but I wonder if it can be expanded I to the modelling phase. Or, more importantly, if it’s possible to fit that type of model to data.

In modelling turbulence the issue of model + discretisation error and their coupling is serious and people from the group of Bob Moser at UT Austin have been making a foray into Bayesian approaches to this.

Interesting point about converting the ODE into a prior over the space of dynamical systems – in many ways this is what has been done in our paper via the joint conditioning on state and derivatives given the right-hand side of the system. Need to speak with Prof. Holmes about his work on genetics in this regard.

Thanks to all that commented on our recent paper. Let me just follow up on some of these comments.

Scott – You are absolutely right that standard numerical methods for solving differential equations have many approaches to tuning, checking convergence, accelerating progress using e.g. Jacobian structures, pre-conditioning of equations to obtain better conditioned systems and so on and so forth. There has been well over half a century of formal analysis, algorithmic design, and engineering tuning of these software codes to bring them to the high level of sophistication and efficiency they currently posses. They are however devoid of any probabilistic semantics.

As Andrew pointed out all these codes are predicated on a fully deterministic forward solve of the system. However it has been for some time clear to Applied Mathematicians, Engineers, Physicists and Mathematical Modelers in general that all sources of uncertainty *must* be accounted for and furthermore this uncertainty must be propagated coherently through the whole modeling, simulation, analysis and reasoning process. In this paper I have argued, as John Skilling and Persi Diaconis did almost 30 years ago, the probability calculus provides the framework to achieve this.

The two main “Known Unknowns” are (1) model inadequacy and (2) numerical discretization error and these are typically defined by various error bounds in the current literature. We can do better. I adopted the Kennedy and O’Hagan framework for this research as both forms of Known-Unknowns can be formally accounted for in a probabilistic setting. Daniel Lakeland alludes to this in that the ODE/PDE systems are not the *correct* physical system and we should admit this and account for it.

When I wrote the abstract for the paper I tried to argue that having a well-defined probability measure on the infinite dimensional space of functions that are consistent with the system of differential equations allows the modeler to pose and answer questions such as “what level of discretisation is required before our overall uncertainty gets beyond limits we deem unacceptable when making subsequent inference about the system model?” The paper Marcos pointed to is suggestive of this point. Furthermore as Dan Simpson mentions taking the view that the ‘forward solve’ is drawing samples of functions from a probability measure may also highlight longer term instabilities without recourse to running the solver “zillions’ of times.

Finally the algorithm we developed in the paper is most certainly not the last word on ‘probabilistic solvers’. It is but the first attempt, a proof-of-concept if you will, in providing a sampling based method that at least has the comfort of consistency guarantees in the case of IVPs. However there is no doubt that smarter, more time, computation, and statistically efficient methods and their associated algorithms will, and must, be developed.

I make the case in the discussion of the paper that the same order of algorithmic scaling can be achieved for Probabilistic Integrators as for existing deterministic codes. Though I prefer to think of them as Sampling schemes from Probability Measures over the Functional Space.

To enable the quantification and propagation of uncertainty in the modeling process I foresee a time when there will be suites of highly efficient codes, that allow this form of functional sampling, as part of the day-to-day Applied Mathematicians, Modelers and Engineers toolbox just as non-probabilistic methods such as Crank-Nicholson etc etc etc are today.

Mark,

it seems quite reasonable that you focus primarily for numerical discretization error. Your interest is not in the ‘correct’ physical modelling of a pheonomenon, but rather the use of discretized numerical methods to accurately capture the dynamics of a continuous ODE. The interesting thing is that you have used Bayesian methods / uncertainty to try to identify the most likely (continuous) solutions to the true continuous ODE, while most numerical analysts will look at the various inaccuracies of the numerical schemes, the odd behaviour of discretized systems and the various oddities that discretization induces (i.e., artifacts of the discretization). You instead want to focus on the true ODE (note I did not say true model) and what (discrete) numerical information you have in order to reduce the uncertainty. This is really a novel way of doing things.

Probabilistic methods are gaining ground in areas of numerical analysis previously only covered by deterministic schemes (e.g., Probabilistic Linear Algebra). It seems quite reasonable that probabilistic method should gain more ground in PDEs/ODEs, although I believe you are only using probabilistic reasoning, rather than true randomization of output (i.e., a MCMC). It might be interesting to put both into the picture, which could make for a tight enough posterior with less systems constraints.