Neel Shah: modeling skewed and heavy-tailed data as approximate normals in Stan using the LambertW function

Neel Shah, one of Stan’s Google Summer of Code (GSOC) interns, writes:

Over the summer, I will add LambertW transforms to Stan which enable us to model skewed and heavy-tailed data as approximate normals. This post motivates the idea and describes the theory of LambertW × Z random variables.

Though the normal distribution is one of our go-to tools for modeling, the real world often generates observations that are inconsistent with it. The data might appear asymmetric around a central value as opposed to bell-shaped or have extreme values that would be discounted under a normality assumption. When we can’t assume normality, we often have to roll up our sleeves and delve into a more complex model. But, by using LambertW × Z random variables it is possible for us to model the skewness and kurtosis from the data. Then, we continue with our model as if we had a normal distribution. Later, we can back-transform predictions to account for our skewness and kurtosis.

In the first part, we introduce the LambertW function, also known as the product logarithm. Next, we discuss skewness and kurtosis (measures of asymmetry and heavy-tailedness), define the LambertW × Z random variables, and share our implementation plans. Finally, we demonstrate how LambertW transforms can be used for location-hypothesis testing with Cauchy-simulated data.

To simplify matters, we are focusing on the case of skewed and/or heavy-tailed probabilistic systems driven by Gaussian random variables. However, the LambertW function can also be used to back-transform non-Gaussian latent input. Because Stan allows us to sample from arbitrary distributions, we anticipate that LambertW transforms would naturally fit into many workflows.

The full story, with formulas and graphs, is here. It’s great to see what’s being done in the Stan community!

5 thoughts on “Neel Shah: modeling skewed and heavy-tailed data as approximate normals in Stan using the LambertW function

  1. Not only the Lambert W but also the inverse hyperbolic sine function allow ‘approximate’ normals. I’ve worked extensively with both of these transformations and in comparison with quantile regression. At the end of the day, I came to the conclusion that, on the whole, QR provided a better (greater accuracy and dependence) solution.

  2. I think the non-principal branch of the LambertW function is going to be more relevant with Stan than it was in Goerg’s papers, but it is fine to start with the principal branch. The Hamiltonian Monte Carlo algorithm in Stan is really good at finding problematic regions on the parameter space, such as those that switch sides of a branch cut, and thus yield divergent transitions.

    • Yeah We’ve been thinking about about how to switch from the principal to non-principal branch, but we can tackle that once we get to that point. I’m not sure how HMC is going to handle that since as you say it makes a discontinuity when switching between the principal and non-principal branches

  3. If I’m reading these papers correctly, the proposal is to use a LambertW transform on heavy tailed data, then treating the transformed data as gaussian? If I were modeling something heavy tailed and skewed with Stan, I would directly implement the log probability of some distributional family that can exhibit those properties. Ex: Cauchy, pareto-normal mixture, sinharcsinh, etc. It seems like this transformation would be more useful if we want to use inflexible statistical tools that assume gaussian distributions. For example, Georg M. Georg (what a name!!!) writes

    One way to overcome this shortcoming is to replace MN with a new model M𝐺, where 𝐺 is a heavy tail distribution:
    (i) regression with Cauchy errors [10];
    (ii) forecasting long memory processes with heavy tail innovations [11, 12], or ARMA modeling of electricity loads with hyperbolic noise [13]. See also Adler et al. [14] for a wide range of statistical applications and methodology for heavy-tailed data.

    While such fundamental approaches are attractive from a theoretical perspective, they can become unsatisfactory from a practical point of view. Many successful statistical models and techniques assume Normality, their theory is very well understood, and many algorithms are implemented for the simple and often much faster Gaussian case. Thus developing models based on an entirely unrelated distribution 𝐺 is like throwing out the (Gaussian) baby with the bathwater

    In stan, isn’t the throwing out of the baby with the bathwater the whole point?

Leave a Reply to Ben Goodrich Cancel reply

Your email address will not be published. Required fields are marked *