I have done some preliminary benchmarking using the Garch(1,1) model (initial volatility as parameter as well for a total of 5 parameters) and the LuaJIT version is roughly 4 times slower (2.6 with the single-pass forward mode differentiation mentioned above).

For the roughly 2000 observations CPU time is dominated by the computation of the log-density and its gradient. Profiling LuaJIT seems to indicate that roughly 50% of the time is spent calling the auto-diff versions of log and sqrt so I’m under the impression that this could be optimised (not necessarily with reverse mode). While I spent some time tuning NUTS I didn’t do the same with the log-density of the Garch, so I’m confident improvements here are possible.

Instead sampling 1e5 samples from the 3-dimensional normal distribution below seems to be 5 times faster with LuaJIT: 1.1 seconds versus 5 seconds with Stan. (Stan is saving the samples to the output.csv (no idea how to disable that), while I’m computing the covariance matrix in realtime). Let me know if I’ve done mistakes in the definition below, I’m not too familiar with Stan.

Just for precision: all of that with Stan 2.2.0. So all in all it seems the results depend on the model.

transformed data {

matrix[3,3] Sigma;

vector[3] mu;

mu[1] <- 0.0;

mu[2] <- 0.0;

mu[3] <- 0.0;

Sigma[1,1] <- 6.727102;

Sigma[1,2] <- -1.132535;

Sigma[1,3] <- -3.381619;

Sigma[2,1] <- -1.132535;

Sigma[2,2] <- 2.149042;

Sigma[2,3] <- -0.122779;

Sigma[3,1] <- -3.381619;

Sigma[3,2] <- -0.122779;

Sigma[3,3] <- +2.061933;

}

parameters {

vector[3] y;

}

model {

y ~ multi_normal(mu,Sigma);

}

Thanks — I meant to include the link you sent me for your NUTS implementation, but somehow wrote that whole post without it.

Have you profiled it versus Stan for any models? We’re also working on an evaluation setup and coming up with models we want to use for benchmarking.

]]>Almost everything is in Lua: the libraries, NUTS, the target density.

This has advantages (completely generic density specification) and disadvantages (less ‘model-oriented’ syntax).

Performance should be good (bearing in mind gradient computations scale linearly in the number of dimensions), same order of an hypothetical equivalent implementation in a compiled language.

The exact link for NUTS is: http://www.scilua.org/sci_mcmc.html

Stefano (main author)

]]>