Luca La Rocca writes:

You may like to know that the approach suggested in your post, Don’t do the Wilcoxon, is qualified as “common practice in Genome-Wide Association Studies”,

according to this forthcoming paper in Biometrics to which I have no connection (and which I didn’t inspect beyond the Introduction).

The idea is that, instead of doing Wilcoxon or some other rank-based test, you first rank the data, then convert them to z-scores using an inverse-normal transformation, then just analyze these transformed data using regression or Anova or whatever.

It’s a natural idea, not one that I take any particular credit for. We put it in BDA just to explain how a Bayesian might want to attack a problem for which someone might have otherwise applied a nonparametric test. I’m pretty sure the idea has been reinvented a zillion other times before and after I did it. So I’m glad that, at least in some fields, it’s a standard approach.

Very common in (professional) finance, for decades at least.

Stupid question: what if data has ties?

When we used this in our new R-hat paper, reviewers sent us hunting for references. We went back to at least 1937

“The use of ranks to avoid the assumption of normality goes back to Friedman (1937). Chernoff and Savage (1958) show rank based approaches have good asymptotic efficiency. Instead of using rank values directly and modifying tests for them, Fisher and Yates (1938) propose to use expected normal scores (ordered statistics) and use the normal models. Blom (1958) shows that accurate approximation of the expected normal scores can

be computed efficiently from ranks using an inverse normal transformation.“

Hmm. I would suggest this is not actually “fully” Bayesian. The method Andrew describes is one step in using the so-called “rank likelihood”. The additional step is sampling the ranks from a truncated normal distribution.

So, first, indeed use the “inverse-normal transformation” but then next we want to sample the ranks. This is essentially a semi-parametric copula.

See for example

Graphical model:

Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1(1), 265-283.

Multivariate regression:

Alexopoulos, A., & Bottolo, L. (2019). Bayesian Variable Selection for Gaussian copula regression models. arXiv preprint arXiv:1907.08245.

From the original post

> The question arises: if my simple recommended approach indeed dominates Wilcoxon, how is it that Wilcoxon remains popular? I think much has to do with computation: the inverse-normal transformation is now trivial, but in the old days it would’ve added a lot of work to what, after all, is intended to be rapid and approximate.

The persistence of this issue in statistics is absolutely baffling to me. It was mildly annoying at the turn of the millenium when ubiquitous personal computers were 15 or so years old; today when you can bootstrap 100,000 datapoints in a couple hundred milliseconds on your telephone it’s a crime against science. My best guesses for why it’s still a thing are

1. People use the tools their father gave them and then hand them down to their own kids

2. People prefer complex approximations with iffy asymptotic assumptions because it makes them feel like what they’re doing has more sophisticated mathematical content

Either way, it suggests this will continue to be a problem long into the age of dollar a day HPC clusters. Gah!

Does the transformed rank form have a direct, substantive interpretation?

Sentinel:

You can undo the rank transformation and then use average predictive comparisons to get back to the original scale of the data.

I noted in a comment above that the rank transformation is commonly used in copula regression, and allows for readily transforming back to the original scale.

But.. in general, the approaches do not merely take the ranks. While sampling, the bounds of each rank are used to sample from a truncated normal distribution. So the underlying latent (normal) data accounts for uncertainty. This is because the transform to normality implicitly assumes an underlying latent variable (similar to a probit model).

D, Dan:

Yes, it’s such an obvious idea that it makes sense that people are doing it all over. That’s why it’s doubly frustrating that lots of people in statistics and biostatistics use that Wilcoxon thing. The trouble, I think, is that statistical procedures are categorized by data type, so people are choosing their models not based on underlying structure but based on often arbitrary data features. I’ve even seen people take perfectly good ordered data and discard the ordering just so they can do a chi-squared test.

Well, the direct rank-score analysis does run into the issues highlighted in the paper, which are probably more relevant for tricky larger models and when you are interested in effect sizes. You are requiring the error distribution to live on this weird transformed space (which doesn’t really matter with robust se and large enough n), but more importantly blowing up additive and linear relationships in whatever the “native” space is. If you don’t have any prior information on which variables are likely to be additive and linear to help you judge what that transformation is and the native scale isn’t meaningful, then I guess you haven’t lost much. The “indirect” method (project the data to the null space of the covariates then rank and analyze) has predictable trouble with heavy tailed outcomes like log-normal.

I’m surprised at your enthusiasm given your prior posts on the danger of doing nonlinear least squares “wrong” in exactly this way (transform then analyze).

Although I appreciate your point that this is still a better alternative to a Wilcoxon than a complete strategy.

That was a major part for how to transform inputs into a neural network in a recent Kaggle competition (https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/44629). I think that was the first time I really noticed this idea – I clearly skipped that chapter when reading BDA and never came across all these other references…

Andrew – can you point us to any evidence that your proposed procedure works about as well (or better than) wilcoxon?

I mean other than conceptual, like power curves or something?

Joshua:

If the only goal is to test for a difference between the groups, or to estimate the average difference in ranks, the rank-transformation procedure won’t do better than Wilcoxon; it will be basically the same as Wilcoxon. It basically

isWilcoxon. The point of rank transformation here is that it places the problem in a regression context so you can add more predictors.Hmmm . . . let me check my original post. Here’s what I wrote:

It would not be hard to do stuff with power curves etc.; I don’t see the need to do so myself, as the basic idea is already clear to me.