*by Andrew Gelman and Bob Carpenter*

We’ve been talking about some of the many many ways that parallel computing is, or could be used, in Stan. Here are a few:

– Multiple chains (Stan runs 4 or 8 on my laptop automatically)

– Hessians scale linearly in computation with dimension and are super useful. And we now have a fully vetted forward mode other than for ODEs.

– EP (data partitioning)

– Running many parallel chains, stopping perhaps before convergence, and weighting them using stacking (Yuling and I are working on a paper on this)

– Bob’s idea of using many parallel chains spawned off an optimization, as a way to locate the typical set during warmup

– Generic MPI for multicore in-box and out-of-box for

parallel density evaluation

– Multithreading for parallel forward and backward time exploration in HMC

– Multithreading parallel density evaluation

– GPU kernelization of sequence operations

– Multithreading for multiple outcomes in density functions

– Then there’s all the SSE optimization down at the CPU level for pipelining.

P.S. Thanks to Zad for the above image demonstrating parallelism.

Say I have a nice looking model written in stan. What is the easiest way to do a simulation study to investigate the operating characteristics of said model? Can I run stan in parallel, once for every different simulated dataset?

Harlan:

Yes, you can do that, no problem.

Great- link for a tutorial? or example?

I heard recently on a C++ podcast about an NVidia compiler that was able to convert C++-17 code into parallel code that can run on a GPU (https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/). Is this related to the “GPU kernelization of sequence operations” mentioned?

“Hessians scale linearly in computation with dimension” — How is this true? I mean, at a minimum, a Hessian of dimension D has D(D+1)/2 unique entries.