by Andrew Gelman and Bob Carpenter
We’ve been talking about some of the many many ways that parallel computing is, or could be used, in Stan. Here are a few:
– Multiple chains (Stan runs 4 or 8 on my laptop automatically)
– Hessians scale linearly in computation with dimension and are super useful. And we now have a fully vetted forward mode other than for ODEs.
– EP (data partitioning)
– Running many parallel chains, stopping perhaps before convergence, and weighting them using stacking (Yuling and I are working on a paper on this)
– Bob’s idea of using many parallel chains spawned off an optimization, as a way to locate the typical set during warmup
– Generic MPI for multicore in-box and out-of-box for
parallel density evaluation
– Multithreading for parallel forward and backward time exploration in HMC
– Multithreading parallel density evaluation
– GPU kernelization of sequence operations
– Multithreading for multiple outcomes in density functions
– Then there’s all the SSE optimization down at the CPU level for pipelining.
P.S. Thanks to Zad for the above image demonstrating parallelism.
Say I have a nice looking model written in stan. What is the easiest way to do a simulation study to investigate the operating characteristics of said model? Can I run stan in parallel, once for every different simulated dataset?
Harlan:
Yes, you can do that, no problem.
Great- link for a tutorial? or example?
I heard recently on a C++ podcast about an NVidia compiler that was able to convert C++-17 code into parallel code that can run on a GPU (https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/). Is this related to the “GPU kernelization of sequence operations” mentioned?
“Hessians scale linearly in computation with dimension” — How is this true? I mean, at a minimum, a Hessian of dimension D has D(D+1)/2 unique entries.