So. This one is pretty simple. But the general idea could be useful to some of you. So here goes.

We were fitting a model with an autocorrelation parameter, rho, which was constrained to be between 0 and 1. The model looks like this:

eta_t ~ normal(rho*eta_{t-1}, sigma_res), for t = 2, 3, ... T

To make this work out, you need the following marginal distribution for the first element in the series:

eta_1 ~ normal(0, sigma_res/sqrt(1 - rho^2)).

In case you’re wondering, eta is a vector of latent parameters deep inside the model. In addition to these autocorrelated error terms, the model also has linear predictors, independent errors, and all sorts of other things. But here we’re focusing on the etas.

It’s easy enough to write the above model in Stan, either by computing the T x T covariance matrix and using the multivariate normal, or by simply building up the distribution one step at a time, as above, which is how we did it.

When fitting the model there was some slow mixing, and we decided it could work better to reparameterize in terms of the marginal variance rather than the residual variance.

Thus, we defined:

sigma = sigma_res/sqrt(1 - rho^2)

And then the above model became:

eta_1 ~ normal(0, sigma) eta_t ~ normal(rho*eta_{t-1}, sigma*sqrt(1 - rho^2)), for t = 2, 3, ... T

But then there were also issues with rho, having to do with posterior mass piling up near rho = 1. This is a situation that experienced time-series modelers will be familiar with, the so-called unit root problem, that time series often seem to want to be nonstationary. This sort of data issue should be respected, and it typically is a sign that the model needs more structure, for example long-term trends. Here, though, we wanted first to deal with awkward nonlinearity in the posterior dependence of rho and sigma, and we decided to reparameterize as follows:

xi = 1/(1 - rho^2)

When rho ranges from 0 to 1, xi varies from 1 to infinity. So now we don’t have to worry about the density piling up at the boundary.

So, in Stan:

parameters { realxi; real sigma; vector[T] eta; } transformed parameters { real rho; vector[T] M; vector[T] S; rho = sqrt(1 - 1/xi); M[1] = 0; S[1] = sigma; for (t in 2:T){ M[t] = rho*eta[t-1] S[t] = sigma*sqrt(1 - square(rho)); } } model { eta ~ normal(M, S); }

There are various ways the code could be cleaner, especially with future language improvements. For example, I’d like M and S to not have to be named variables; instead they could be attributes of eta in some way.

But that discussion is for another time.

For now, let’s continue with this model. For computational reasons, it is convenient for xi, not rho, to be the parameter in the Stan model. But suppose we want to put a prior on the scale of rho. For example, the model above has an implicit flat prior on xi, but that implies a big mass on high values, as xi is unconstrained on the high end.

To put a flat prior on rho, when the above model is parameterized in terms of xi, we need to add the log Jacobian to the log posterior density, or target function in Stan.

The Jacobian is the absolute value of the derivative of the transformation:

J = |d(xi)/d(rho)| = 2*rho/(1-rho^2)^2

But I always get confused which direction to include the Jacobian, also I’m always worried that I get my algebra wrong.

So I write a little test program, “transformation_test.stan”:

parameters { realxi; } transformed parameters { real rho; rho = sqrt(1 - 1/xi); } model { target += log(rho) - 2*log(1 - square(rho)); }

And then I run this R script:

library("rstan") options(mc.cores = parallel::detectCores()) rstan_options(auto_write = TRUE) fit <- stan("transformation_test.stan") print(fit)

And here's what we get:

Inference for Stan model: transformation_test. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat xi 1.076634e+16 2.917436e+14 4.410165e+15 3.446772e+15 6.921461e+15 1.086366e+16 1.463243e+16 1.765822e+16 229 1.03 rho 1.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 749 1.00 lp__ 1.088500e+02 4.000000e-02 6.900000e-01 1.064800e+02 1.085600e+02 1.090100e+02 1.093100e+02 1.095000e+02 338 1.02

Damn! I guess I did the Jacobian in the wrong direction.

Let's try again:

parameters { realxi; } transformed parameters { real rho; rho = sqrt(1 - 1/xi); } model { target += - log(rho) + 2*log(1 - square(rho)); }

And now:

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat xi 6.33 2.48 93.92 1.00 1.07 1.36 2.25 17.34 1430 1 rho 0.50 0.01 0.29 0.02 0.26 0.51 0.75 0.97 1358 1 lp__ -1.60 0.03 0.91 -4.23 -1.83 -1.25 -1.02 -0.96 712 1

Yesssss! It works. rho has a uniform distribution, which was our goal.

The whole procedure, including several debugging steps not shown here, took about 10 minutes. (In contrast, it took half hour to write all this up.)

OK, we've succeeded in implementing the trivial. So what?

I'm not claiming this is some great accomplishment. Rather, this is the sort of thing we need to do sometimes. And so it's good to have a way to check our work.

Now that we've succeeded in the reparameterization, we can add a prior on rho directly. For example, if we want normal(0.7, 0.2), we just add it to the model block:

rho ~ normal(0.7, 0.2);

Or, equivalently:

target += normal(rho | 0.7, 0.2);

In either case, the constraint to the range [0, 1] is implicit, having already been achieved in the declaration step.

And now we can move on and continue with the rest of our modeling.

Thanks for posting this. I was unaware of the phrase “unit-root problem”, but I’ve encountered it many times, especially in scenarios such as you describe.

That 10m was while we were having a pretty concentrated discussion in which Andrew was the principal discussant!

I suppose the puzzling thing here is that defining rho with hard constraints on (0,1) also samples in an unconstrained space with Stan. So the code seems to be re-inventing the wheel… or at least it’s not clear why the hard constraint wouldn’t also address this problem.

Bob K:

Hmmm, you might be right about this! It’s a different transformation (logit rather than 1/(1-rho^2)), but maybe the logit would do the job too here. I hadn’t thought about that.

Yeah, there should be no piling up issue in the internal unconstrained space that Stan uses, so why do you have to manually unconstrain this variable? One suspects that this “fix” may be helping, but not really for the reasons you think it is.

I like to put a logit-normal prior on rho … I have no idea if this is “good” statistical practice though.

1. The first program is missing `lower = 0` on `sigma` declaration. As Andrew keeps saying, we need a linter (what he calls “pedantic mode”).

2. For now, log1m(x) is more efficient and stable than log(1 – x). We’ll automate that soon when the OCaml parser goes through.

I prefer this version of the first program:

It’ll also be much more efficient because (a) it vectorizes multiplication by rho, and (b) uses a scalar scale parameter for eta[2:T] rather than dupliating the sqrt, square and multiplication ops to create the argument then the log in the evaluation of it.

Bob:

No, the lower=0 and lower=1 are not missing at all. They just got removed from the html because of the angle brackets.

Is there a way to alter the “pre” tag in html to not swallow up the angle brackets? I know that I could change to that “gt” and “lt” thing but that’s ugly because then I can’t just copy in the code.

Nope. I had to update my comment, too. I thought <pre> was supposed to deal with this, so maybe it’s a WordPress thing.

– Should ρ alone be called the auto-correlation parameter? Or, is this a conventional terminology? If σ is close to 0 and ρ > 0 then the auto-correlation between x_t and x_{t+1} will be 1 no matter how small ρ is. So σ and ρ together determine the auto-correlation in the sequence.

– Currently, the transformation_test.stan (as visible on the blog) fail to converge because the constraints were gobbled up by WordPress. I asked about this on the stan forum [here](https://discourse.mc-stan.org/t/reproducing-the-parameter-transformation-example-from-the-statmodeling-blog/7527) but after reading the comments above I figured out what went wrong.

Thanks for posting this. I do a decent amount of time series modelling. Due to the necessity of putting constraints on parameters, this can get quite challenging to get reasonable fits in Stan.

I’m not sure whether it always works, but this seems to be a simple way to decide whether you need to worry about a Jacobian in Stan:

If you declare something in the “parameters” block but put the associated prior on something from the “transformed parameters” block, then you also need a Jacobian.

That’s right. It’s also true if the thing you put the prior on is a local variable in the model block or just a function on the left-hand side of a sampling statement.

Also, using an AR1 structure on parameters is going to lead to a lot of dependence among them and presumably smaller step sizes and lower effective sample sizes. It is easy and effective to declare the innovations as the parameters and build the AR1 structure in the transformed parameters block with

[edit: repladed markdown with html to make WordPress happy.]

Browsing around on the web it seems that just plain pre does the job.

Bob

___________________________

—————————

\ ^__^

\ (oo)\_______

(__)\ )\/\

||—-w |

|| ||

This is coded as

<pre>

___________________________

—————————

\ ^__^

\ (oo)\_______

(__)\ )\/\

||—-w |

|| ||

</pre>Well, what works in an HTML preview doesn’t seem to work here.

I wish this blog had either (1) a preview option or (2) the capability for a user to delete a post that they made.

Bob.

We (those of us with keys to the blog) can edit comments, but general users can’t. And yes, WordPress mangles basic HTML. Markdown’s just a total mess in general. Does anyone know of a WordPress plugin or something we could use for previews?