Skip to content
 

More on absolute error vs. relative error in Monte Carlo methods

This came up again in a discussion from someone asking if we can use Stan to evaluate arbitrary integrals. The integral I was given was the following:

\displaystyle \alpha = \int_{y \in R\textrm{-ball}} \textrm{multinormal}(y \mid 0, \sigma^2 \textrm{I}) \, \textrm{d}y

where the R-ball is assumed to be in D dimensions so that y \in \mathbb{R}^D.

(MC)MC approach

The textbook Monte Carlo approach (Markov chain or plain old) to evaluating such an integral is to evaluate

\displaystyle \frac{1}{M} \sum_{m=1}^M \textrm{normal}(y^{(m)} \mid 0, \sigma),

where the marginal distribution of each y^{(m)} is

\displaystyle y^{(m)} \sim \textrm{uniform}(R\textrm{-ball}).

I provide a Stan program for doing this below.

If you want to do this without Stan, you can rejection sample points in the ball very easily in low dimensions. In higher dimensions, you can generate a random angle and radius (the random radius needs to be drawn non-uniformly in dimensions higher than 1 to account for increasing area/volume as the radius increases). Stan could’ve also used an angle/radius parameterization, but I’m too lazy to work out the trig.

Most draws can contribute zero

If the volume of the R-ball is large compared to the volume containing the draws you get from \textrm{normal}(0, \sigma), then most of the draws from the R-ball will have roughly zero density in \textrm{normal}(0, \sigma). The effect of dimensionality is exponential on the difference in volume.

Absolute error

Even the volume of the R-ball is large compared to the volume containing the draws you get from \textrm{normal}(0, \sigma), the (MCMC) central limit theorem still controls error. It’s just that the error it controls is squared error. Thus if we measure absolute error, we’re fine. We’ll get an estimate near zero which is right to several decimal places.

Relative error

If the value of the integral is 0.01 or something like that, then to get 10% relative error, we need our estimate to fall within 0.001 of the true answer. 0 doesn’t do that.

What we have to do with a naive application of Monte Carlo methods is make a whole lot of draws to get an estimate between 0.009 and 0.011.

Relation to discrete sampling

This is related to the discrete sampling problem we ran into when trying to estimate the probability of winning the lottery by buying tickets and computing a Monte Carlo estimate. See my previous post, Monte Carlo and winning the lottery. The setup was very hard for readers to swallow when I first posted about it (my experience is that statisticians don’t like thought experiments or simplified hello world examples).

Stan program

The complexity comes in from the constraining trasnform to the R-ball and corresponding Jacobian adjustment to make the distribution in the ball uniform.

For the transform, I used a stick-breaking procedure very similar to how Stan transforms simplex variables and the rows of correlation matrix Cholesky factor. The transform requires a Jacobian, which is calculated on the fly. An approach that could be done with uniform draws purely through a transform would be to draw an angle (uniformly) and radius (non-uniformly based on distance from the origin); this would probably be more efficient.

/**
 * Sampling for this program computes
 *
 *   INTEGRAL_{theta in R ball in N dimensions} normal(theta | 0, sigma) d.theta
 */
data {
  int N;
  real R;
  real sigma;
}
parameters {
  vector<lower = -1, upper = 1>[N] alpha;
}
transformed parameters {
  // transform alpha to the unit ball
  vector[N] theta;

  // log_abs_det_J accumulates the change-of-variables correction
  real log_abs_det_J = 0;
  {
    real sum_sq = 0;
    for (n in 1:N) {
      real mult = sqrt(1 - sum_sq);
      theta[n] = alpha[n] * mult;
      sum_sq += theta[n]^2;
      log_abs_det_J += log(mult);
    }      
  }
  
  // scale from unit ball to R-ball;  no Jacobian because R is const.
  theta *= R;
}  
model {
  // the Jacobian's triangular, so the log determinant is the sum of
  // the log of the absolute derivatives of the diagonal of J, i.e.,
  // sum(log(d.theta[n] / d.alpha[n])), as computed above

  target += log_abs_det_J;

  // theta is now uniform on R-ball
}
generated quantities {
  // posterior mean of E will be:
  //    INTEGRAL_{y in R-ball} normal(y | 0,  sigma) d.y
  //    =approx= 1/M SUM_{m in 1:M}  normal(theta(m) | 0, sigma)
  //             where theta(m) drawn uniformly from R-ball
  real E = exp(normal_lpdf(theta | 0, sigma));
}

As noted in the program inline documentation, the posterior mean of E is the result of the integral of interest.

10 Comments

  1. I haven’t followed this completely, I’m rushing between things, but I thought I’d mention that if you want to draw a point uniformly over the surface of a d hypersphere, then you draw d normal(0,1) variates and form a vector, then normalize the vector to unit length (or draw one unit diagonal multivariate normal and normalize its length)

    in Julia

    d = 10
    dir = rand(MvNormal(zeros(d),1))
    dir = dir/norm(dir)

  2. Carlos Ungil says:

    That integral may be “arbitrary” but it’s very convenient to work with. Instead of changing the variables one could also do importance sampling. The problem becomes quite simple: integrate the function that is one within the R-ball and zero outside sampling from the multivariate normal.

  3. anon says:

    The parameter definition should be vector[n] alpha , right? Otherwise, for example, if alpha[1]^2 > 1 , then, in the first run of the loop, mult is NaN.

  4. Daniel: Stan uses that approach for parameterizing points on a hypersphere. But here we’re drawing points in a hyperball.

    Carlos: I didn’t mean to imply this integral is hard to compute. It was meant to illustrate that when the summands in an MCMC estimate are mostly zero, the absolute error obeys the expected bounds from the MCMC CLT, but the relative error is terrible. I brought up “arbitrary” integrals because we often talk about MCMC as being a way to compute Bayesian posterior expectations for arbitrary functions of the parameters. Those functions have to be well behaved to produce decent relative error using the plug-and-play MCMC estimate,

    $latex \displaystyle\mathbb{E}[f(\theta) \mid y] \approx \frac{1}{M} \sum_{m = 1}^M f(\theta^{(m)})$

    where $latex \theta^{(m)} \sim p(\theta \mid y)$.

    Edit. Hmm. Looks like math doesn’t work in comments. That’s

    E[f(theta) | y] = 1/M SUM_{m in 1:M} f(theta(m))
    

    for

    theta(m) ~ p(theta | y)
    

    Anon: Yes, definition should be

    vector<lower = -1, upper = 1>[N]
    

    Blame WordPress for eating both of our homework.

    • Martha (Smith) says:

      Hmm– maybe we should call Word Press “The Dog”.

    • Carlos Ungil says:

      Bob, I don’t quite understand your discussion about relative and absolute error.

      When you say “The textbook Monte Carlo approach to evaluating such an integral is to evaluate …” you’re missing a factor V (the volume of the domain of integration). The value of the integral is the volume times the average value of the function, the Monte Carlo computation estimates that average. In this example, the value of the integral is (close to) one [*]; the discussion about the value of the integral being 0.01 does seem applicable in this context.

      [*] For high values of R, which is what you’re discussing. See https://imgur.com/a/yPuXJMx for the example of a 10-dimensional standard normal and R from 1 to 10. The true value of the integral is displayed in red, the boxplots show V*f(x) for 1000 points selected uniformly from the hypersphere of radius R and the blue points are 10 Monte Carlo estimates for the integral obtained averaging 100 points each. As R grows the estimate of (the volume times) the mean of the function in the hyperball gets worse because (the volume times) the standard deviation of the function in the hyperbola gets larger.

      • Anonymous says:

        @Carlos: I suspect we’re talking past each other, because my point was very simple. So at the risk of stating the obvious…

        In calculus, the volume over which one integrates is explicit, though in an expectation the function is weighted by the probability density,

        E[f(Theta) | y] = INTEGRAL_{theta in Theta} f(theta) * p(theta | y) d.theta

        With Monte Carlo, the expectation calculation is just

        E[f(Theta) | y] = 1/M SUM_{m in 1:M} f(theta(m)),

        where

        theta(m) ~ p_Theta|Y(theta | y).

        The volume part’s implicit in generating the theta(m) in volumes of high probability mass.

        The MCMC CLT governs the error of the Monte Carlo expectation estimate as

        mcmc-se = sd(theta) / sqrt(ess(theta))

        where ess(theta) is the effective sample size of theta(1), .., theta(M).

        But that’s additive error around the expected value,

        estimate = true_value + err,

        where the error distribution is

        err ~ normal(0, mcmc-se).

      • @Carlos: I suspect we’re talking past each other, because my point was very simple. So at the risk of stating the obvious…

        In calculus, the volume over which one integrates is explicit, though in an expectation the function is weighted by the probability density,

        E[f(Theta) | y] = INTEGRAL_{theta in Theta} f(theta) * p(theta | y) d.theta

        With Monte Carlo, the expectation calculation is just

        E[f(Theta) | y] = 1/M SUM_{m in 1:M} f(theta(m)),

        where

        theta(m) ~ p_Theta|Y(theta | y).

        The volume part’s implicit in generating the theta(m) in volumes of high probability mass.

        The MCMC CLT governs the error of the Monte Carlo expectation estimate as

        mcmc-se = sd(theta) / sqrt(ess(theta))

        where ess(theta) is the effective sample size of theta(1), .., theta(M).

        The error is additive error around the expected value,

        estimate = true_value + err,

        where the error distribution is

        err ~ normal(0, mcmc-se).

        That’s additive error around the true value theta that doesn’t depend on the true value, just the mcmc-se (assuming the estimator’s unbiased, of course). Relative error, on the other hand, is a proportion, usually measured in terms of (estimate – actual) / abs(actual). Of course, that doesn’t work for actual = 0 (which is a huge pain for Stan testing).

        • Carlos Ungil says:

          I don’t think we disagree on the details, but what got me confused is that you start asking about the solution of one integral and then you solve a different (related) one.

          A = Integral_{y in R-ball}[ multinormal(y) dy ]

          (for simplicity I take the standard normal with mean 0 and variance I)

          which would be solved using a stright foward Monte Carlo calculation as

          A = Volume(R-ball) E_{y in R-ball}[ multinormal(y) ] = Volume(R-ball) SUM_{m in 1:M} multinormal(y_m)

          where y_m is sampled uniformly in R-ball.

          The integral that you calculate is a different one and corresponds to the second factor only

          B = Integral_{y in R-ball}[ multinormal(y) p(y) dy ]

          where p(y) is a density function which is constant in R-ball but depends on the domain of integration:

          p(y) = 1/Volume(R-ball).

          The relation between them is of course

          B = 1/Volume(R-ball) Integral_{y in R-ball}[ multinormal(y) dy ] = 1/Volume(R-ball) A

          > That’s additive error around the true value theta that doesn’t depend on the true value, just the mcmc-se (assuming the estimator’s unbiased, of course).

          In this example, looking at your integral B (the average of the multinormal function in the R-ball), as the R-ball gets larger both the expected value and the standard deviation of the Monte Carlo estimator change.

          If we consider integral A the expected value of the Monte Carlo estimator is almost constant (as the R-ball gets larger the integral approaches 1) but the standard deviation increases.

          We can imagine other cases (for example a “scaling”) where both may change but the relative error remains constant and other cases (for example a “translation”) where the standard deviation remains the same but the expected value changes so the relative error may approach to zero or diverge as you mentioned.

Leave a Reply