High-dimensional data analysis

I came across this talk by David Donoho (see also here for more detail) from 2000. I was disappointed to see that he scooped me on the phrase “blessing of dimensionality” but I guess this is not such an obscure idea.

More interesting are the different perspectives that one can have on high-dimensional data analysis. Donoho’s presentation (which certainly is still relevant six years later) focuses on computational approaches to data analysis and says very little about models. Bayesian methods are not mentioned at all (the closest is slide #44, on Hidden Components, but no model is specified for the components themselves). It’s good that there are statisticians working on different problems using such different methods.

Donoho also discusses Tukey’s ideas of exploratory data analysis and discusses why Tukey’s approach of separation from mathematics no longer makes sense. I agree with Donoho on this, although perhaps from a different statistical perspective: my take on exploratory data analysis is that (a) it can much more powerful when used in conjunction with models, and (b) as we fit increasingly complicated models, it will become more and more helpful to use graphical tools (of the sort associated with “exploratory data analysis”) to check these models. As a latter-day Tukey might say, “with great power comes great responsibility.” See this paper and this paper for more on this.

I was also trying to understand the claim on page 14 on Donoho’s presentation that the fundamental roadblocks of data analysis are “only mathematical.” From my own experiences and struggles (for example, here), I’d interpret this from a Bayesian perspective as a statement that the fundamental challenge is coming up with reasonable classes of models for large problems and large datasets–models that are structured enough to capture important features of the data but not so constrained as to restrict the range of reasonable inferences. (For a non-Bayesian perspective, just replace the word “model” with “method” in the previous sentence.)

1 thought on “High-dimensional data analysis

Comments are closed.