How many clusters? And a Declaration of Selection Bias.

Pointing to this article with Pietro Coretto, “An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture based clustering,” Christian Hennig writes:

You will see that this is a fairly specialist methodological paper on cluster analysis, so why should you be interested in it?

Actually, when presenting new statistical methodology and run simulations that show that they work, there is often a huge selection bias – we (or probably at least many of us) will look hard for setups in which the new method that we want to propose works well, and we will occasionally adapt our method if we find that it doesn’t work as well as we’d hope, in order to make it look better, and use these better results to promote the method instead of trying it out on some new data or data generating processes (DGP).

Here’s the key passage from Hennig and Coretto’s paper:

Declaration of selection bias: As this paper introduces a new method, as a proof of concept we need to show some situations in which it works well. We looked at some other datasets and data generating mechanisms (although usually with a very small number of test runs). In many cases there was no big difference between the different methods, and sometimes mclust with or without noise, or a mixture of t-distributions or skew t-distributions worked better (than our new method), though never all of them. Sometimes nothing worked well. So we do not claim that adotrimle (our new method) is universally the best, just where we show it is. DGP 3 was the first DGP we tried, and we show it despite not being a clear win for the new methods.

1 thought on “How many clusters? And a Declaration of Selection Bias.

  1. It would be wonderful if this sort of thing caught on. For non-researchers (read: users of these techniques) it can be hard to know where to turn when your traditional toolkit hits an edge case or is less performant than you’d like.

    I’m much more likely to spend the time to understand / toy with a method that is presented this way, instead of the usual “dating profile” way we usually see (wow you’re X lbs heavier, Y inches shorter, and Z years older than your pictures!).

    Good on ’em.

Leave a Reply

Your email address will not be published. Required fields are marked *