Skip to content

Heller, Heller, and Gorfine on univariate and multivariate information measures

Malka Gorfine writes:

We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard.

It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them.

There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods:

1) HSIC by Gretton et al. (

2) dcov by Szekely et al. (

3) A method we introduced in Heller et al (Biometrika, 2013, 503—510,, and an R package, HHG, is available as well

As proved in Sejdinovic et al. (Annals of Stat, 2013, 2263—2291, the first two methods are somewhat equivalent.

As to univariate methods, there are many consistent methods, and some of them are:

1) Hoeffding (

2) Various methods based on mutual information estimation.

3) Any of the multivariate methods mentioned above.

4) A new class of methods we recently developed and currently available at

Regarding MIC, we fully agree with the criticism of Professor Kinney that “there is no good reason to use MIC”. We would also like to add that since MIC requires exponential time to calculate, what actually is used is an approximation. However, this approximation might not be consistent even in the limited cases for which MIC was proven to be consistent. Therefore, MIC is not on the list above of consistent univariate methods.

Furthermore, in multiple independent power analyses MIC has been found to have lower power than other methods (Simon and Tibshirani,; Gorfine et al.; and de Siqueira Santos et al

Regarding equitability, we again concur with Kinney and Atwal that contrary to its claim, MIC is not equitable and mutual information is an equitable measure (in the sense defined by Kinney and Atwal). However, we agree with Professor Gelman (if we understood him correctly) that being equitable is not necessarily a good thing and therefore this does not mean that MI should be the only method used to test dependence (especially as it is hard to estimate). In fact, perhaps bias towards “simpler” relationships is a good thing. Of course, one needs to find a good definition of “simpler” and we hope to contribute to that research direction in the future.

On behalf of Ruth Heller, Yair Heller and Malka Gorfine

I have nothing to add here. This is an important topic I don’t know much about, and I’m happy to circulate the ideas of researchers in this area.


  1. Anonymous says:

    In case you or anyone else is interested, the Columbia statistics department is having a free conference tomorrow about nonparametric measures of statistical dependence. Several of the authors of the methods listed above will be giving talks.

  2. Anonymous says:

    Thanks for the post!
    I want to use an independence test to reject a no-difference hypothesis in a 2-sample test (by testing the independence of a group A/B indicator variable with the group A/B sample vectors).

    Which one of these tests would be most suited for this specific task?

    • Yair Heller says:

      Indeed, the two sample problem can be viewed as an instance of the independence problem. However, sometimes there are subtleties (e.g. handling of ties) that require specific changes for the two sample case. Furthermore, sometimes more efficient algorithms can be used for the two sample case. In fact, we are working on a more efficient version of our independence test for the two sample problem and if you contact us directly we can supply you with a preliminary version of the R package. Similarly, in the energy package you can find an implementation of dcov for the two sample problem.
      Regarding other methods, Larry Wasserman in his blog had an excellent post on the two sample problem at

    • Fernando says:

      What exactly do you mean by “no difference”?

      • The Wind. says:

        I assume “a no-difference hypothesis” is just the null hypothesis that both groups are drawn from exactly the same distribution – the null for something like a KS test.