Skip to content
 

Transforming parameters in a simple time-series model; debugging the Jacobian

So. This one is pretty simple. But the general idea could be useful to some of you. So here goes.

We were fitting a model with an autocorrelation parameter, rho, which was constrained to be between 0 and 1. The model looks like this:

eta_t ~ normal(rho*eta_{t-1}, sigma_res), for t = 2, 3, ... T

To make this work out, you need the following marginal distribution for the first element in the series:

eta_1 ~ normal(0, sigma_res/sqrt(1 - rho^2)).

In case you’re wondering, eta is a vector of latent parameters deep inside the model. In addition to these autocorrelated error terms, the model also has linear predictors, independent errors, and all sorts of other things. But here we’re focusing on the etas.

It’s easy enough to write the above model in Stan, either by computing the T x T covariance matrix and using the multivariate normal, or by simply building up the distribution one step at a time, as above, which is how we did it.

When fitting the model there was some slow mixing, and we decided it could work better to reparameterize in terms of the marginal variance rather than the residual variance.

Thus, we defined:

sigma = sigma_res/sqrt(1 - rho^2)

And then the above model became:

eta_1 ~ normal(0, sigma)
eta_t ~ normal(rho*eta_{t-1}, sigma*sqrt(1 - rho^2)), for t = 2, 3, ... T

But then there were also issues with rho, having to do with posterior mass piling up near rho = 1. This is a situation that experienced time-series modelers will be familiar with, the so-called unit root problem, that time series often seem to want to be nonstationary. This sort of data issue should be respected, and it typically is a sign that the model needs more structure, for example long-term trends. Here, though, we wanted first to deal with awkward nonlinearity in the posterior dependence of rho and sigma, and we decided to reparameterize as follows:

xi = 1/(1 - rho^2)

When rho ranges from 0 to 1, xi varies from 1 to infinity. So now we don’t have to worry about the density piling up at the boundary.

So, in Stan:

parameters {
  real xi;
  real sigma;
  vector[T] eta;
}
transformed parameters {
  real rho;
  vector[T] M;
  vector[T] S;
  rho = sqrt(1 - 1/xi);
  M[1] = 0;
  S[1] = sigma;
  for (t in 2:T){
    M[t] = rho*eta[t-1]
    S[t] = sigma*sqrt(1 - square(rho));
  }
}
model {
  eta ~ normal(M, S);
}

There are various ways the code could be cleaner, especially with future language improvements. For example, I’d like M and S to not have to be named variables; instead they could be attributes of eta in some way.

But that discussion is for another time.

For now, let’s continue with this model. For computational reasons, it is convenient for xi, not rho, to be the parameter in the Stan model. But suppose we want to put a prior on the scale of rho. For example, the model above has an implicit flat prior on xi, but that implies a big mass on high values, as xi is unconstrained on the high end.

To put a flat prior on rho, when the above model is parameterized in terms of xi, we need to add the log Jacobian to the log posterior density, or target function in Stan.

The Jacobian is the absolute value of the derivative of the transformation:

J = |d(xi)/d(rho)|
  = 2*rho/(1-rho^2)^2

But I always get confused which direction to include the Jacobian, also I’m always worried that I get my algebra wrong.

So I write a little test program, “transformation_test.stan”:

parameters {
  real xi;
}
transformed parameters {
  real rho;
  rho = sqrt(1 - 1/xi);
}
model {
  target += log(rho) - 2*log(1 - square(rho));
}

And then I run this R script:

library("rstan")
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

fit <- stan("transformation_test.stan")
print(fit)

And here's what we get:

Inference for Stan model: transformation_test.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean      se_mean           sd         2.5%          25%          50%          75%        97.5% n_eff Rhat
xi   1.076634e+16 2.917436e+14 4.410165e+15 3.446772e+15 6.921461e+15 1.086366e+16 1.463243e+16 1.765822e+16   229 1.03
rho  1.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00   749 1.00
lp__ 1.088500e+02 4.000000e-02 6.900000e-01 1.064800e+02 1.085600e+02 1.090100e+02 1.093100e+02 1.095000e+02   338 1.02

Damn! I guess I did the Jacobian in the wrong direction.

Let's try again:

parameters {
  real xi;
}
transformed parameters {
  real rho;
  rho = sqrt(1 - 1/xi);
}
model {
  target += - log(rho) + 2*log(1 - square(rho));
}

And now:

      mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
xi    6.33    2.48 93.92  1.00  1.07  1.36  2.25 17.34  1430    1
rho   0.50    0.01  0.29  0.02  0.26  0.51  0.75  0.97  1358    1
lp__ -1.60    0.03  0.91 -4.23 -1.83 -1.25 -1.02 -0.96   712    1

Yesssss! It works. rho has a uniform distribution, which was our goal.

The whole procedure, including several debugging steps not shown here, took about 10 minutes. (In contrast, it took half hour to write all this up.)

OK, we've succeeded in implementing the trivial. So what?

I'm not claiming this is some great accomplishment. Rather, this is the sort of thing we need to do sometimes. And so it's good to have a way to check our work.

Now that we've succeeded in the reparameterization, we can add a prior on rho directly. For example, if we want normal(0.7, 0.2), we just add it to the model block:

  rho ~ normal(0.7, 0.2);

Or, equivalently:

  target += normal(rho | 0.7, 0.2);

In either case, the constraint to the range [0, 1] is implicit, having already been achieved in the declaration step.

And now we can move on and continue with the rest of our modeling.

17 Comments

  1. Garnett says:

    Thanks for posting this. I was unaware of the phrase “unit-root problem”, but I’ve encountered it many times, especially in scenarios such as you describe.

  2. That 10m was while we were having a pretty concentrated discussion in which Andrew was the principal discussant!

  3. Bob K says:

    I suppose the puzzling thing here is that defining rho with hard constraints on (0,1) also samples in an unconstrained space with Stan. So the code seems to be re-inventing the wheel… or at least it’s not clear why the hard constraint wouldn’t also address this problem.

  4. 1. The first program is missing `lower = 0` on `sigma` declaration. As Andrew keeps saying, we need a linter (what he calls “pedantic mode”).

    2. For now, log1m(x) is more efficient and stable than log(1 – x). We’ll automate that soon when the OCaml parser goes through.

    I prefer this version of the first program:

    parameters {
      real xi;
      real<lower = 0> sigma;
      vector[T] eta;
    }
    model {
      real rho = sqrt(1 - inv(xi));
      eta[1] ~ normal(0, sigma);
      eta[2:T] ~ normal(rho * eta[1: T - 1], sigma * sqrt(1 - square(rho)));
    }
    

    It’ll also be much more efficient because (a) it vectorizes multiplication by rho, and (b) uses a scalar scale parameter for eta[2:T] rather than dupliating the sqrt, square and multiplication ops to create the argument then the log in the evaluation of it.

  5. John Hall says:

    Thanks for posting this. I do a decent amount of time series modelling. Due to the necessity of putting constraints on parameters, this can get quite challenging to get reasonable fits in Stan.

  6. Ed says:

    I’m not sure whether it always works, but this seems to be a simple way to decide whether you need to worry about a Jacobian in Stan:

    If you declare something in the “parameters” block but put the associated prior on something from the “transformed parameters” block, then you also need a Jacobian.

  7. Ben Goodrich says:

    Also, using an AR1 structure on parameters is going to lead to a lot of dependence among them and presumably smaller step sizes and lower effective sample sizes. It is easy and effective to declare the innovations as the parameters and build the AR1 structure in the transformed parameters block with

    parameters {
      ...
      vector[T] epsilon; // innovations
    }
    transformed parameters {
      ...
      vector[T] eta;
      eta[1] = epsilon[1];
      for (t in 2:T) eta[t] = rho * eta[t - 1] + epsilon[t];
    }
    

    [edit: repladed markdown with html to make WordPress happy.]

  8. Bob says:

    Browsing around on the web it seems that just plain pre does the job.
    Bob

    ___________________________

    —————————
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||—-w |
    || ||

    This is coded as

    &ltpre&gt

    ___________________________

    —————————
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||—-w |
    || ||

    &lt/pre&gt

Leave a Reply