Neel Shah, one of Stan’s Google Summer of Code (GSOC) interns, writes:

Over the summer, I will add LambertW transforms to Stan which enable us to model skewed and heavy-tailed data as approximate normals. This post motivates the idea and describes the theory of LambertW × Z random variables.

Though the normal distribution is one of our go-to tools for modeling, the real world often generates observations that are inconsistent with it. The data might appear asymmetric around a central value as opposed to bell-shaped or have extreme values that would be discounted under a normality assumption. When we can’t assume normality, we often have to roll up our sleeves and delve into a more complex model. But, by using LambertW × Z random variables it is possible for us to model the skewness and kurtosis from the data. Then, we continue with our model as if we had a normal distribution. Later, we can back-transform predictions to account for our skewness and kurtosis.

In the first part, we introduce the LambertW function, also known as the product logarithm. Next, we discuss skewness and kurtosis (measures of asymmetry and heavy-tailedness), define the LambertW × Z random variables, and share our implementation plans. Finally, we demonstrate how LambertW transforms can be used for location-hypothesis testing with Cauchy-simulated data.

To simplify matters, we are focusing on the case of skewed and/or heavy-tailed probabilistic systems driven by Gaussian random variables. However, the LambertW function can also be used to back-transform non-Gaussian latent input. Because Stan allows us to sample from arbitrary distributions, we anticipate that LambertW transforms would naturally fit into many workflows.

The full story, with formulas and graphs, is here. It’s great to see what’s being done in the Stan community!

Not only the Lambert W but also the inverse hyperbolic sine function allow ‘approximate’ normals. I’ve worked extensively with both of these transformations and in comparison with quantile regression. At the end of the day, I came to the conclusion that, on the whole, QR provided a better (greater accuracy and dependence) solution.

I think the non-principal branch of the LambertW function is going to be more relevant with Stan than it was in Goerg’s papers, but it is fine to start with the principal branch. The Hamiltonian Monte Carlo algorithm in Stan is really good at finding problematic regions on the parameter space, such as those that switch sides of a branch cut, and thus yield divergent transitions.

Yeah We’ve been thinking about about how to switch from the principal to non-principal branch, but we can tackle that once we get to that point. I’m not sure how HMC is going to handle that since as you say it makes a discontinuity when switching between the principal and non-principal branches

If I’m reading these papers correctly, the proposal is to use a LambertW transform on heavy tailed data, then treating the transformed data as gaussian? If I were modeling something heavy tailed and skewed with Stan, I would directly implement the log probability of some distributional family that can exhibit those properties. Ex: Cauchy, pareto-normal mixture, sinharcsinh, etc. It seems like this transformation would be more useful if we want to use inflexible statistical tools that assume gaussian distributions. For example, Georg M. Georg (what a name!!!) writes

In stan, isn’t the throwing out of the baby with the bathwater the whole point?

Agreed, though if the lambert W winds up being useful there’s no reason not to have it.