[Update: Revised given comments from Wingfeet, Andrew and germo. Thanks! I’d mistakenly translated the dlnorm priors in the first version — amazing what a difference the priors make. I also escaped the less-than and greater-than signs in the constraints in the model so they’re visible. I also updated to match the thin=2 output of JAGS.]
We’re going to be starting a Stan “model of the P” (for some time period P) column, so I thought I’d kick things off with one of my own. I’ve been following the Wingvoet blog, the author of which is identified only by the Blogger handle Wingfeet; a couple of days ago this lovely post came out:
Wingfeet’s post implemented an answer to question 6 from chapter 6 of problem from Rowland and Tozer’s 2010 book, Clinical Pharmacokinetics and Pharmacodynamics, Fourth edition, Lippincott, Williams & Wilkins.
So in the grand tradition of using this blog to procrastinate, I thought I’d take a break from C++ coding and paper writing and translate Wingfeet’s model to Stan. First, let me repeat Wingfeet’s model in JAGS, exactly as posted:
tau <- 1/pow(sigma,2) sigma ~ dunif(0,100) # IV part kIV ~dnorm(.4,1) cIV ~ dlnorm(1,.01) for (i in 1:nIV) { predIV[i] <- c0*exp(-k*timeIV[i]) +cIV*exp(-kIV*timeIV[i]) concIV[i] ~ dnorm(predIV[i],tau) } c0 <- doseIV/V V ~ dlnorm(2,.01) k <- CL/V CL ~ dlnorm(1,.01) AUCIV <- doseIV/CL+cIV/kIV # oral part for (i in 1:nO) { predO[i] <- c0star*(exp(-k*timeO[i])-exp(-ka*timeO[i])) concO[i] ~ dnorm(predO[i],tau) } c0star <- doseO*(ka/(ka-k))*F/V AUCO <- c0star/k F ~ dunif(0,1) ka ~dnorm(.4,1) ta0_5 <- log(2) /ka t0_5 <- log(2)/k
I still find JAGS models pretty hard to follow on their own, but Wingfeet provided the data, and the call to R2Jags, so I could work backwards from there. Here's what I came up with as a translation to Stan.
data { int<lower=0> nIV; int<lower=0> nOral; real<lower=0> doseIV; real<lower=0> doseOral; vector<lower=0>[nIV] timeIV; vector<lower=0>[nIV] concIV; vector<lower=0>[nOral] timeOral; vector<lower=0>[nOral] concOral; } parameters { real<lower=0> CL; real<lower=0> V; real<lower=0,upper=1> F; real<lower=0> ka; real<lower=0> kIV; real<lower=0> cIV; real<lower=0,upper=100> sigma; } transformed parameters { real k; real c0; real AUCIV; real c0star; real AUCOral; real ta0_5; real t0_5; vector[nIV] predIV; vector[nOral] predOral; k <- CL / V; c0 <- doseIV / V; AUCIV <- doseIV / CL + cIV / kIV; c0star <- doseOral * (ka / (ka - k)) * F / V; AUCOral <- c0star / k; ta0_5 <- log(2) / ka; t0_5 <- log(2) / k; predIV <- c0 * exp(-k * timeIV) + cIV * exp(-kIV * timeIV); predOral <- c0star * (exp(-k * timeOral) - exp(-ka * timeOral)); } model { // IV component kIV ~ normal(0.4, 1); cIV ~ lognormal(1,10); V ~ lognormal(2,10); CL ~ lognormal(1,10); concIV ~ normal(predIV, sigma); // oral component ka ~ normal(0.4,1); concOral ~ normal(predOral, sigma); }
Do I ever appreciate the emacs mode for Stan! The Stan model is a direct translation of Wingfeet's JAGS model, with exactly the same priors (Stan uses a sd parameterization of the normal, whereas JAGS follows bugs in using inverse variance). I moved all the deterministic nodes in the JAGS model to the transformed parameters block in the Stan model so we could inspect them in the output. I also vectorized the prediction calculations, which seems much cleaner than using loops, but opinions may vary on this point.
It was critical to put the lower-bounds on all the concentration and volume parameters in Stan, as well as upper and lower bounds on the noise term (which had a broad uniform distribution in the JAGS model).
Here's the data in Stan's dump format, which I dropped in a file named data.R
:
nIV <- 12 nOral <- 11 doseIV <- 500 doseOral <- 500 timeIV <- c(0.33, 0.5, 0.67, 1.5, 2, 4, 6, 10, 16, 24, 32, 48) concIV <- c(14.7, 12.6, 11, 9, 8.2, 7.9, 6.6, 6.2, 4.6, 3.2, 2.3, 1.2) timeOral <- c(0.5, 1, 1.5, 2, 4, 6, 10, 16, 24, 32, 48) concOral <- c(2.4, 3.8, 4.2, 4.6, 8.1, 5.8, 5.1, 4.1, 3, 2.3, 1.3)
Here's the code to run it in RStan, with number of warmup iterations and sampling iterations matching Wingfeet's call to the R2Jags package.
library('rstan'); source('data.R'); fit <- stan('pk_iv_oral.stan', data=c("nIV","nOral","doseIV","doseOral", "timeIV","concIV","timeOral","concOral"), chains=4, warmup=5000, iter=14000);
It took about 10s to compile the model and 8s to run 56K iterations in the release version of RStan 2.2.0 (on my several years old Macbook Pro with 2.3GHz Intel Core i7). This uses the default diffuse initializations drawn uniformly on (-2,2) on the unconstrained scale corresponding to roughly (0.1, 10) on the positive-constrained scale
The model converged almost instantly and mixes very well, so this number of iterations is rather an overkill for Stan. Note that I set Stan to thin every other draw, to match the JAGS configuration used by Wingfeet.
nference for Stan model: pk_iv_oral. 4 chains, each with iter=14000; warmup=5000; thin=2; post-warmup draws per chain=4500, total post-warmup draws=18000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat CL 2.35 0.00 0.18 2.00 2.23 2.34 2.46 2.72 13417 1 V 56.44 0.02 2.70 51.73 54.62 56.25 58.00 62.39 12204 1 F 0.87 0.00 0.05 0.77 0.84 0.87 0.91 0.97 12223 1 ka 0.66 0.00 0.10 0.49 0.59 0.65 0.72 0.90 12383 1 kIV 2.08 0.00 0.51 1.16 1.72 2.05 2.41 3.14 11219 1 cIV 11.30 0.02 2.47 7.20 9.57 11.05 12.73 16.90 11978 1 sigma 0.57 0.00 0.11 0.41 0.50 0.56 0.64 0.84 10536 1 k 0.04 0.00 0.00 0.03 0.04 0.04 0.04 0.05 12654 1 c0 8.88 0.00 0.42 8.01 8.62 8.89 9.15 9.67 12448 1 AUCIV 219.90 0.15 16.88 189.43 208.69 219.01 230.17 256.13 13048 1 c0star 8.31 0.01 0.61 7.15 7.91 8.29 8.69 9.56 12398 1 AUCOral 200.25 0.12 14.91 172.41 190.34 199.78 209.56 230.64 14644 1 ta0_5 1.08 0.00 0.16 0.77 0.97 1.07 1.18 1.42 12769 1 t0_5 16.79 0.02 1.69 13.82 15.67 16.64 17.75 20.57 12133 1 predIV[1] 14.36 0.00 0.52 13.29 14.02 14.37 14.70 15.35 13530 1 predIV[2] 12.64 0.00 0.33 11.99 12.43 12.63 12.85 13.29 17290 1 predIV[3] 11.43 0.00 0.35 10.75 11.19 11.42 11.66 12.13 14406 1 predIV[4] 8.92 0.00 0.32 8.33 8.70 8.90 9.11 9.60 14157 1 predIV[5] 8.41 0.00 0.29 7.85 8.22 8.40 8.59 9.00 14461 1 predIV[6] 7.52 0.00 0.28 6.94 7.34 7.53 7.71 8.06 13229 1 predIV[7] 6.91 0.00 0.25 6.39 6.75 6.92 7.08 7.39 13267 1 predIV[8] 5.85 0.00 0.22 5.40 5.71 5.85 6.00 6.28 13885 1 predIV[9] 4.56 0.00 0.23 4.10 4.41 4.56 4.71 5.01 13958 1 predIV[10] 3.27 0.00 0.25 2.78 3.11 3.27 3.43 3.77 13481 1 predIV[11] 2.35 0.00 0.25 1.87 2.19 2.34 2.51 2.86 13122 1 predIV[12] 1.22 0.00 0.21 0.84 1.08 1.20 1.34 1.66 12692 1 predOral[1] 2.14 0.00 0.21 1.74 2.00 2.13 2.27 2.59 14328 1 predOral[2] 3.63 0.00 0.29 3.06 3.43 3.62 3.81 4.23 14973 1 predOral[3] 4.65 0.00 0.30 4.05 4.46 4.65 4.85 5.26 15721 1 predOral[4] 5.35 0.00 0.29 4.76 5.16 5.36 5.54 5.91 16427 1 predOral[5] 6.37 0.00 0.27 5.81 6.20 6.38 6.56 6.89 15049 1 predOral[6] 6.27 0.00 0.30 5.64 6.08 6.28 6.48 6.84 13781 1 predOral[7] 5.45 0.00 0.29 4.87 5.27 5.45 5.64 6.01 13940 1 predOral[8] 4.26 0.00 0.24 3.78 4.10 4.26 4.42 4.73 14936 1 predOral[9] 3.05 0.00 0.22 2.62 2.91 3.05 3.20 3.49 14965 1 predOral[10] 2.19 0.00 0.22 1.78 2.05 2.19 2.33 2.63 14373 1 predOral[11] 1.13 0.00 0.18 0.81 1.01 1.12 1.24 1.51 13457 1 lp__ -2.09 0.03 2.33 -7.80 -3.33 -1.68 -0.37 1.19 7734 1 Samples were drawn using NUTS(diag_e) at Tue Mar 11 01:48:42 2014.
Here's the fit that Wingfeet got from JAGS:
Inference for Bugs model at "C:/...BEnL/model1b6027171a7a.txt", fit using jags, 4 chains, each with 14000 iterations (first 5000 discarded), n.thin = 2 n.sims = 18000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff AUCIV 219.959 16.975 189.743 208.636 218.846 230.210 256.836 1.002 2800 AUCO 200.468 15.102 171.908 190.523 199.987 209.928 231.722 1.001 7500 CL 2.346 0.182 1.995 2.226 2.344 2.461 2.712 1.002 2900 F 0.876 0.052 0.773 0.841 0.876 0.911 0.976 1.002 3200 V 56.463 2.752 51.610 54.583 56.280 58.112 62.465 1.002 3100 c0 8.876 0.426 8.005 8.604 8.884 9.160 9.688 1.002 3100 c0star 8.314 0.615 7.119 7.911 8.304 8.704 9.574 1.002 2000 cIV 11.229 2.485 7.029 9.511 11.003 12.654 16.804 1.003 1100 k 0.042 0.004 0.034 0.039 0.042 0.044 0.050 1.002 2200 kIV 2.064 0.510 1.132 1.704 2.041 2.396 3.126 1.004 800 ka 0.659 0.105 0.489 0.589 0.646 0.715 0.903 1.002 3200 predIV[1] 14.343 0.532 13.240 14.009 14.355 14.690 15.378 1.002 2500 predIV[2] 12.637 0.331 11.990 12.422 12.632 12.851 13.298 1.001 12000 predIV[3] 11.435 0.357 10.755 11.196 11.427 11.667 12.147 1.002 1600 predIV[4] 8.923 0.326 8.333 8.709 8.902 9.116 9.638 1.002 2900 predIV[5] 8.410 0.298 7.845 8.213 8.401 8.594 9.021 1.001 14000 predIV[6] 7.522 0.286 6.943 7.340 7.525 7.713 8.064 1.001 5900 predIV[7] 6.910 0.255 6.385 6.749 6.915 7.077 7.399 1.001 6200 predIV[8] 5.848 0.222 5.406 5.704 5.849 5.993 6.285 1.001 7800 predIV[9] 4.557 0.231 4.098 4.409 4.555 4.707 5.012 1.001 4500 predIV[10] 3.270 0.252 2.781 3.107 3.267 3.433 3.773 1.002 3100 predIV[11] 2.349 0.251 1.867 2.183 2.343 2.509 2.865 1.002 2700 predIV[12] 1.217 0.207 0.839 1.076 1.204 1.343 1.662 1.002 2500 predO[1] 2.138 0.214 1.750 1.995 2.124 2.263 2.599 1.001 8700 predO[2] 3.626 0.293 3.070 3.432 3.616 3.807 4.234 1.001 12000 predO[3] 4.653 0.306 4.050 4.452 4.652 4.847 5.264 1.001 16000 predO[4] 5.351 0.294 4.754 5.161 5.354 5.541 5.930 1.001 15000 predO[5] 6.375 0.277 5.801 6.202 6.383 6.562 6.894 1.002 3200 predO[6] 6.274 0.308 5.629 6.079 6.286 6.485 6.842 1.002 2800 predO[7] 5.455 0.292 4.852 5.270 5.461 5.648 6.018 1.002 3900 predO[8] 4.262 0.245 3.764 4.106 4.265 4.423 4.736 1.001 8600 predO[9] 3.057 0.227 2.605 2.910 3.058 3.207 3.504 1.001 8200 predO[10] 2.195 0.218 1.773 2.050 2.192 2.335 2.638 1.001 5200 predO[11] 1.135 0.180 0.805 1.013 1.127 1.247 1.519 1.002 3500 t0_5 16.798 1.713 13.830 15.655 16.652 17.770 20.617 1.002 2200 ta0_5 1.077 0.164 0.768 0.969 1.072 1.177 1.419 1.002 3200 deviance 38.082 5.042 30.832 34.357 37.207 40.918 50.060 1.003 1200
Stan mixed better, but I should add that with thin=1, the number effective samples increased dramatically --- Stan's making nearly independent draws each iteration. Wingfeet reports that the "correct" answers (presumably given in the book) were the following, here reported with the 95% posterior intervals produced by Stan and JAGs (we've been meaning to reduce RStan's posterior intervals to 90% to match CmdStan's, but it hasn't happened yet):
Parameter Correct Stan 95% JAGS 95% --------- ----------- ------------ ------------ t0_5 16.7 hr (13.8, 20.6) (13.8, 20.6) AUCiv 217 mg-hr/L ( 189, 256) ( 190, 256) AUCoral 191 mg-hr/L ( 172, 231) ( 172, 232) CL 2.3 L/hr ( 2.0, 2.7) ( 2.0, 2.7) V 55 L (51.7, 62.4) (51.6, 62.5) F 0.88 (0.77, 0.97) (0.77, 0.98)
It's nice to see two completely different approaches (HMC and generalized Gibbs/slice) give the same answers.
Bob,
I think one difference is in the priors. I used log normals for V, CL, while you used normal.
Could that be it?
Regards,
Wingfeet
Bob:
Yes, I’m confused about the model too. Shouldn’t V and various other parameters (F, k_a, etc.) be constrained to be positive in the Stan model? I looked at the inference for V and it looks like the posterior is something like 5 prior sd’s away from the prior, which doesn’t seem quite right either.
Yes, they should be constrained. The @#$%^&*I(O) WordPress editor dropped the constraints.
I like the looks of the Stan code. In production systems (not specific to Stan) verbose is much easier to read and patch, any day.
Typing verbose can be a pain but in those instances you can write a loop to spit out the draft code. I.e. loop the code but avoid loops _in_ the code. (This applies to vectorization too in unstable operating environments with unpredictable exceptions).
Whats up with all the tabular output?
I’d thought we’d gotten rid of those noisy 2.5% and 97.5% quantiles in the Stan output, but I guess not.
I have some weird problems with this model using RStan (with RStudio IDE). If I use the function stan(), as Bob did, then I do get similar results. But I usually would compile my model before doing sampling, ie something like this:
model <- stan_model(model_code=stan_code)
fit <- sampling(object=model,
data=c("nIV","nOral","doseIV","doseOral", "timeIV","concIV","timeOral","concOral"),
chains=4, warmup=5000, iter=14000)
Shouldn't this give exactly the same results as stan()? But what happens here, is that I just get a lot of those "informational messages" and RStudio becomes very slow to respond and I basically just have to kill the whole process.
Changing the parameters in the model to the following + using lognormal for CL, V and F give almost the same results as JAGS if I use function stan()
real CL;
real V;
real F;
real ka;
real kIV;
real cIV;
real sigma;
But results for sampling() are still quite different. Have I misunderstood how sampling() works or something?
Hmm, somehow the forum just deleted my constains for the parameters…maybe the same thing happened in Bob’s post as well? Ie. anything inside these “less than” and “greater than” signs will get deleted.
Thanks! I’ll update the model and the repost. I love blogs!
I’m afraid that’s the poor comment system of WordPress rearing its ugly head. You need to use “<” for less-than (<) and “>” for (>). Which gives me a chance to plug Mitzi’s new book.
I thought the main body editor for WordPress automatically escaped, but must be confusing it with GitHub’s markdown editor. I’d really prefer just to enter straight-up HTML without all the “smart” intervention.
Is there any reason that MathJax (http://www.mathjax.org/) isn’t used here? I’ve never used WordPress (a google turned up this: https://wordpress.org/plugins/mathjax-latex/ and this: http://docs.mathjax.org/en/v1.1-latest/platforms/wordpress.html), but enabling MathJax is usually painless!
Not sure it will solve this problem specifically…
I don’t think we need MathJax. WordPress lets you enter $latex \LaTeX$ commands like $latex \int e^x dx$ directly using “$latex …$” But it’s a pain for comments, because as far as I know, there’s no way to enable comment preview.
The HTML escaping needing in the body for less-than and quotes is different.
This is a test $latex 2^x \sim P$
It worked!
I didn’t know that! Thanks! But yes, I guess without a preview or comment edit feature, it isn’t very helpful.
For less-than, in the text box you type:
<
And it renders like this:
<
This has been a public service announcement brought to you by Pedants Without Borders.
Oh, also brought to you by the Department of Redundancy Department.
Thanks, Wingfeet, Andrew and germo. I fixed the model, re-ran it, and updated the post. Now everything agrees as in all the other cases where we’ve compared JAGS and Stan.
Hello, I am a new to RStan. I ran your code because I am interested in pharmacokinetic simulations, and the Rhat value is nowhere close to 1. Did you have any errors while running the model?
I get:Warning 1: There were 35503 divergent transitions after warmup
Warning 2: The largest R-hat is 2.29, indicating chains have not mixed.
Warning 3:Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Warning 4Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Thanks for inquiring. We were looking for a good example of this kind of pathological behavior. What’s happening is that initialization in the tail can be ill conditioned to the point where the sampler gets stuck. RStan produces appropriate warnings here.
The best thing to do is reparameterize. I took the liberty of removing the upper bound on sigma (which was causing default initialization at the midpoint of the interval) and adding a reasonable weakly informative prior. Then I tightened some ofthe very very wide lognormal priors. Then I put a constraint on ka to keep it greater than k, which is required downstream.
To fit, I just initialized at the location of the lognormal priors.
Here’s the revised model.
and a script to run it in RStan given the above model and data listed in the original post.
Oh, and I meant to add that if you reparameterize so that all parameters are unit scale, the init wouldn’t be necessary. That’s a pain right now for lower-bounded variables because you have to fiddle with logs/exps and do everyting on the unconstrained scale.
For example, we could have this
That rescales V to match the prior. We’ll build that in more naturally in a future version by allowing offset and multiplier to combine with lower bounds. As is, we have to move to the log scale, then use the log variable in the model with a regular normal, then do the transform ourself. Otherwise, the rest of the model stays the same.
P.S. Why, oh, why does WordPress insist on eating angle brackets in pre environments? Ugh.