Time series likelihood with non-Gaussian errors

David Ross writes,

When you consider a linear/nonlinear model where there are 2 errors are added e.g.
Y = f (\beta) * X + u + \varepsilon
where u ~ AR(1) with NonGaussian (e.g. Laplace) noise and \varepsilon ~ some other distribution (at least NOT AR). The point is that I know only the combined distribution of u and \varepsilon follows Laplace.
Now there are 2 questions –
# What do you think what distribution \varepsilon can follow such that u + \varepsilon jointly follow Laplace ?
# In this structure of the model what could be the way to compute the likelihood?

My reply:

1. I don’t know what \varepsilon it. I’ll assume it represents white noise.

2. I don’t know that you can take a convolution of two reasonable distributions and get the sharp peak of the Laplace distribution. But the real point is, I doubt you should be so set on this particular distribution. I’d think it would make sense to model u and \varepsilon separately and then just let their convolution be whatever it is.

3. You can write the joint likelihood of everything but it might not be so easy to integrate out the u’s to get the marginal likelihood of the hyperparameters. That’s why we end up using Gibbs, EM, etc. I don’t know the literature on time series with non-Gaussian errors but there must be a lot out there.

8 thoughts on “Time series likelihood with non-Gaussian errors

  1. Is the combined distribution of u and varepsilon the joint distribution or the distribution of their sum? Are they independent? If it is the sum and if they are independent, I think the easiest way is to characterize the distribution of varepsilon using the characteristic function.

  2. I don't know that you can take a convolution of two reasonable distributions and get the sharp peak of the Laplace distribution.

    This doesn't deal with the original Q. (where u is Laplace), but – depending on your criterion for "reasonable" – there are straightforward convolutions of two variables that yield Laplace (double exponential).

    Let X1, X2, X3, X4 all be i.i.d. standard normal.

    Further, let
    Y1 = X1 . X2 and
    Y2 = X3 . X4

    Then Y1 and Y2 are iid, and
    U = Y1 + Y2 is Laplace.

    This example is somewhere in Kendall and Stuart (I believe I saw it in 3rd Edition).

    [If I recall correctly (and I'm not going to sit and do the algebra right now to check) the Y's might even be chi-squared(1) variables with a random sign attached, but don't quote me since it's been a while since I looked at this one.]

  3. I'm currently wrestling with a very similar problem.

    In my problem, independent experiments suggest that (in David's notation) varepsilon is Laplace, and u is AR which I am assuming (but do not know) is Gaussian. However, data-model residuals are also compatible with u+varepsilon being Laplace. So I'm not sure how to characterize the distribution of u by itself.

    Right now I'm approximating the problem with a single Laplace AR time series for u, leaving out varepsilon as a separate source of error. I'm interested in being able to treat the two sources of error separately.

    The problem, as Andrew notes, is that even if you know the distributions for u and varepsilon, it's hard to write down the marginal likelihood; you have the latent AR time series (beta X + u) which you have to integrate out. I don't know how to efficiently do that for the large time series I have. Maybe you could attack it with sequential Monte Carlo, with which I'm not very familiar, but I also have to estimate the beta's too, and I don't know how to handle dynamic state and static parameter estimation together. I'm not very optimistic about being able to find an analytic solution, even for something simple like Gaussian u and Laplace varepsilon.

    David, if you make any progress on this problem or want to discuss it further, could you contact me (e-mail in my URL)?

    efrqiue: I've heard the difference of two chi-squared variables is Laplace.

  4. The comment about chi-squared variables
    is right (at least "empirically"):

    x1 = rchisq(10000,df=1)
    x2 = rchisq(10000,df=1)
    x3 = x1-x2
    plot(density(x3))
    curve(exp(-abs(x))/2,add=TRUE,col=2)

    that suggests that the last line above
    should be U=Y1-Y2 (not Y1+Y2).

    This makes sense if you think that
    a chi-squared with 1 df is gamma(shape=1/2,scale=2). In some
    very crude sense adding (convolving)
    two gammas should add the shape parameters,
    leaving you with an exponential.
    I'm not sure how the difference fits
    in, too lazy to do the derivation right now.

    Ben

  5. The construction efrqiue mentions is correct. This can be seen neatly by representation, using the fact that if Z,W are i.i.d. N(0,1), then Z+W and Z-W are also i.i.d. Normals.

    Then it follows easily that X_1X_2 is half the difference of two indep. chi^2_1, and thus that X_1X_2 + X_3X_4 is half the difference of two indep. chi^2_2, which is the difference of two Exponentials, which is in turn Laplace.

  6. Yeah, in my original response I hadn't thought of convolving distributions such as the chi^2_1 that have vertical asymptotes.

  7. Indeed the anonymous post at July 9, 2008 8:07 PM is absolutely correct. I calculated it by hand and it shows that

    if X1, X2, X3, X4 ~ i.i.d. N(0,1)

    Y1 = X1 . X2 and
    Y2 = X3 . X4

    Then Y1 and Y2 are iid, and Y1 + Y2 ~ standard classical Laplace.

Comments are closed.