Skip to content
 

Parallel JAGS RNGs

As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core.

But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1], the Wichmann-Hill RNG.

JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer", .RNG.name="lecuyer::lecuyer", and .RNG.name="lecuyer::LEcuyer"
all to no avail. It ought to be .RNG.name="lecuyer::Lecuyer" to be consistent with the other .RNG.name values! I looked around in the source to find where it checks its name from the inits, to discover that in fact it is

.Rng.name="lecuyer::RngStream"

So here’s how I set up 4 chains now:

library(doMC); registerDoMC()
library(rjags); load.module("glm"); load.module("lecuyer")
library(random)
jinits <- function() {
   ### all the other params ###
  .Rng.name="lecuyer::RngStream",
  .Rng.seed=randomNumbers(n = 1, min = 1, max = 1e+06,col=1)
}
jags.parsamples <- foreach(i=1:getDoParWorkers()) %dopar% {
  model.jags <- jags.model(model, forJAGS,
                           inits=jinits,
                           n.chain=1, n.adapt=1000)
  result <- coda.samples(model.jags,params,1000)
  return(result)
}

I would just as soon initialize them to the same state and use sequential substreams, but I think there is no way to do this. Four long separately-seeded streams should be more than fine; a quick look suggests that if you did n.chain>1 (on each core) you’d get sequential substreams.

I should also probably write a better .combine so that it’s an mcmc.list and not just a list, but whatever. This works, almost 4 times (yeah yeah overhead blah blah) faster than the usual n.chain=4 would be!

10 Comments

  1. The dclone package also allows for parallel processing through jags in one line via the jags.parfit function

  2. I just checked: dclone does not deal with the JAGS RNG at all, much less the lecuyer module.

    • Peter Solymos says:

      jags.parfit in dclone does not deal implicitly with RNGs, because it is done via setting up inits as usual for jags.model. The trick is to use jags.model (or jags.fit in case of jags.parfit) with 0 iterations to generate the RNGs. This is then used and inits are distributed to the workers.

  3. Let's wrap it all and put it in Jags() in the r2jags package.

  4. Xavier says:

    Jags runs parallel chains making use of different cores when the BLAS and Lapack libraries are configured to do so. In any GNU/Linux distribution, JAGS can be compiled against the Atlas-threads library, so that by default it uses different cores for each chain. I think that it is the easiest way.

  5. lingpipe says:

    Why should we be using L'Ecuyer's random-number generator? (That was a real question, not rhetorical.) If it uses a different instance per thread, then the instances don't need to be thread safe. I don't think the generators use huge amounts of memory to store their state, so there doesn't seem to be much to be saved from sharing a random-number generator across threads.

    • Peter Solymos says:

      I think L'Ecuyer's RNG kicks in when one wishes to use >4 chains thread safe. Otherwise JAGS comes with 4 kinds of RNGs and jags.model assumes different RNGs by default when not declared otherwise by inits. I haven't tried L'Ecuyer with my dclone package, but I will do so soon enough.

  6. Niels says:

    Would it be possible to have JAGS use different data for each of the parallel chains? This would be convenient if one would be using multiple imputed datasets. The same models need to be run on different datasets, an option to estimate these chains in parallel would speed things up considerably. Any pointers on how to accomplish this would be greatly appreciated.

    • @Niels: I’ve wrapped the calls to the JAGS functions into one which accepts just a list with all the control arguments (data and inits), then I user it as the first argument with foreach/%dopar% and L’Ecuyer RNG. The main point is that foreach can take an arbitrary list to work with—this is not demonstrated above, but in the foreach docs. Each element of the list can be the setup for a single run, including data, inits, and control parameters for JAGS. I haven’t gotten around to posting the code yet…