They’re trying to get a hold on the jungle of cluster analysis.

Posted on May 12, 2024 9:29 AM by Andrew

Iven Van Mechelen, Christian Hennig, and Henk Kiers write:

The domain of cluster analysis is a meeting point for a very rich multidisciplinary encounter, with cluster-analytic methods being studied and developed in discrete mathematics, numerical analysis, statistics, data analysis, data science, and computer science (including machine learning, data mining, and knowledge discovery), to name but a few. The other side of the coin, however, is that the domain suffers from a major accessibility problem as well as from the fact that it is rife with division across many pretty isolated islands. As a way out, the present paper offers a thorough and in-depth review of the clustering domain as a whole under the form of an outline map based on an overarching conceptual framework and a common language. With this framework we wish to contribute to structuring the clustering domain, to characterizing methods that have often been developed and studied in quite different contexts, to identifying links between methods, and to introducing a frame of reference for optimally setting up cluster analyses in data-analytic practice.

So, they’re trying to apply a sort of clustering to . . . the field of cluster analysis. I’m not knowledgeable enough about this area to evaluate their effort. I’m sharing it with those of you know might know more.

Also, I’m sympathetic to this work because a few years ago Hennig and I did something similar in our attempt to organize ideas around the concepts of subjectivity and objectivity in statistics.

4 thoughts on “They’re trying to get a hold on the jungle of cluster analysis.”

Christian Hennig on May 12, 2024 10:22 AM at 10:22 am said:

The paper has just been accepted by WIREs Data Mining and Knowledge Discovery.

Sec. 2 has something on general data analytic workflow, potentially of interest beyond cluster analysis.

Reply ↓
John Mashey on May 13, 2024 2:29 AM at 2:29 am said:

Roger K Blashfield (my cousin’s husband, published often on cluster analysis, sometiems visited Bell Labs to talk to Joe Kruskal) and MS Aldenderfer wrote
“The Literature On Cluster Analysis”, originally 1978.
https://www.tandfonline.com/doi/abs/10.1207/s15327906mbr1303_2 (paywall, sorry)
I’ve mislaid my copy, but Roger had told me about this paper, which had difficulty getting published, but was amusing.
He said they’d found the same mathematics across many disciplines, but usually with different terminologies and notations,
except for a few weird cases where seemingly-unconnected fields (via people X & Y) used the same ones for no obvious reason.
They’d investigated, and found tehre were relationships that don’t usually show up in standaerd Social Network Analyses like coauthorship studies.
As I recall there were cases ~ X & Y had been college roommates, or X was married to Y’s sister.

Reply ↓
- Andrew on May 13, 2024 8:22 AM at 8:22 am said:
  
  John,
  
  That’s an interesting use of network analysis to analyze network analysis! I’m vaguely reminded of a plan that Aki and I had, which we never carried out, which was to perform a bunch of analyses based on taking various science-jargon terms literally. For example, we would analyze a “toy problem” that would actually be data about toys. I can’t remember our other examples. When I took stochastic processes many years ago, we were told of the “counter example,” which was not an actual counterexample; rather, it was a queuing problem involving people waiting at a counter, which was called the “counter example” as a joke.
  
  Reply ↓
John Mashey on May 13, 2024 12:21 PM at 12:21 pm said:

Amusing, when I took stochastic processes & operations research courses in college, we had counter examples, but not as jokes.
1) At Bell Labs, we appled informal Social Network Analysis by figuring out “gatekeeper networks” to help in technology diffusion,
i.e., there were lots of important informal relationships that didn’t show up on organization charts.

2) Of course, the Wegman affair involved much SNA misuse & I bought a few textbooks as a result, later got email from an SNA professor in Germany who used the Wegman Report as an example of misuse of SNA.

Ofcourse:
Google: “cluster analysis” “social network analysis” gets many hits.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

They’re trying to get a hold on the jungle of cluster analysis.

4 thoughts on “They’re trying to get a hold on the jungle of cluster analysis.”

Leave a Reply Cancel reply