The most recent release of SciLua includes an implementation of Matt’s sampler, NUTS (link is to the final *JMLR* paper, which is a revision of the earlier *arXiv* version).

According to the author of SciLua, Stefano Peluchetti:

Should be quite similar to your [Stan’s] implementation with some differences in the adaptation strategy.

If you have time to have a look and give it a try I would appreciate your feedback. My website includes the LuaJIT binaries for 3 arches [Windows, Mac OS X, Linux], and the Xsys and Sci libraries needed to run NUTS.

I’ve also been working on a higher-level interface with some test models and datasets (the user can specify a simple script to describe the model, but it’s always Lua syntax, not a custom DSL).

Please notice that multi-pass forward automatic differentiation is used (I experimented with a single-pass, tape based, version but the speed-up was not big and for high-dimensional problems one wants to use reverse diff in any case, on my TODO).

For complex models you might want to increase loopunroll a little, and maybe callunroll as well. Feel free to ask any question.

I haven’t had time to try it myself. Looks like it’s implemented using a just-in-time compiler (JIT), like Julia, with models specified in its own language, Lua.

To clarify what Bob said: yes, SciLua (which includes NUTS) builds on top of LuaJIT which is a JIT compiler for Lua.

Almost everything is in Lua: the libraries, NUTS, the target density.

This has advantages (completely generic density specification) and disadvantages (less ‘model-oriented’ syntax).

Performance should be good (bearing in mind gradient computations scale linearly in the number of dimensions), same order of an hypothetical equivalent implementation in a compiled language.

The exact link for NUTS is: http://www.scilua.org/sci_mcmc.html

Stefano (main author)

Thanks — I meant to include the link you sent me for your NUTS implementation, but somehow wrote that whole post without it.

Have you profiled it versus Stan for any models? We’re also working on an evaluation setup and coming up with models we want to use for benchmarking.

I have done some preliminary benchmarking using the Garch(1,1) model (initial volatility as parameter as well for a total of 5 parameters) and the LuaJIT version is roughly 4 times slower (2.6 with the single-pass forward mode differentiation mentioned above).

For the roughly 2000 observations CPU time is dominated by the computation of the log-density and its gradient. Profiling LuaJIT seems to indicate that roughly 50% of the time is spent calling the auto-diff versions of log and sqrt so I’m under the impression that this could be optimised (not necessarily with reverse mode). While I spent some time tuning NUTS I didn’t do the same with the log-density of the Garch, so I’m confident improvements here are possible.

Instead sampling 1e5 samples from the 3-dimensional normal distribution below seems to be 5 times faster with LuaJIT: 1.1 seconds versus 5 seconds with Stan. (Stan is saving the samples to the output.csv (no idea how to disable that), while I’m computing the covariance matrix in realtime). Let me know if I’ve done mistakes in the definition below, I’m not too familiar with Stan.

Just for precision: all of that with Stan 2.2.0. So all in all it seems the results depend on the model.

transformed data {

matrix[3,3] Sigma;

vector[3] mu;

mu[1] <- 0.0;

mu[2] <- 0.0;

mu[3] <- 0.0;

Sigma[1,1] <- 6.727102;

Sigma[1,2] <- -1.132535;

Sigma[1,3] <- -3.381619;

Sigma[2,1] <- -1.132535;

Sigma[2,2] <- 2.149042;

Sigma[2,3] <- -0.122779;

Sigma[3,1] <- -3.381619;

Sigma[3,2] <- -0.122779;

Sigma[3,3] <- +2.061933;

}

parameters {

vector[3] y;

}

model {

y ~ multi_normal(mu,Sigma);

}