The just released R package brms version 2.14.0 supports within-chain parallelization of Stan. This new functionality is based on the recently introduced `reduce_sum`

function in Stan, which allows to evaluate sums over (conditionally) independent log-likelihood terms in parallel, using multiple CPU cores at the same time via threading. The idea of `reduce_sum`

is to exploit the associativity and commutativity of the sum operation, which allows to split any large sum into many smaller partial sums.

Paul Bürkner did an amazing job to enable within-chain parallelization via threading for a broad range of models as supported by brms. Note that currently threading is only available with the CmdStanR backend of brms, since the minimal Stan version supporting `reduce_sum`

is 2.23 and rstan is still at 2.21. It may still take some time until rstan can directly support threading, but users will usually not notice any difference between either backend once configured.

We encourage users to read the new threading vignette in order to get an intuition of the new feature as to what speedups one can expect for their model. The speed gain by adding more CPU cores per chain will depend on many model details. In brief:

- Stan models taking days/hours can run in a few hours/minutes, but models running just a few minutes will be hard to accelerate
- Models with computationally expensive likelihoods will parallelize better than those with cheap to calculate ones like a normal or a Bernoulli likelihood
- Non-Hierarchical and hierarchical models with few groupings will greatly benefit from parallelization while hierarchical models with many random effects will gain somewhat less in speed

The new threading feature is marked as „experimental“ in brms, since it is entirely new and there may be a need to change some details depending on further experience with it. We are looking forward to hear from users about their stories when using the new feature at the Stan Discourse forums.

Cool! Thanks for the heads up. I’m a big fan of the brms package.

Just tried it out. The speed gain is quite impressive! Thanks for the nice work.

Big `brms` fan here as well. This is the first crack in one of the remaining barriers to Bayesian-MCMC adoption more broadly.