Igor Carron reports on a paper by Richard Barniuk and Michael Wakin, “Random projections of smooth manifolds,” that is billed as a universal dimension reduction tool. That sounds pretty good.
I’m skeptical about the next part, though, as described by Carron, a method for discovering the dimension of a manifold. This is an ok mathematical problem, but in the problems I work on (mostly social and environmental statistics), the true dimension of these manifolds is infinity, so there’s nothing to discover. Rather than a statement like, “‘We’ve discovered that congressional roll-call votes fall on a 3-dimensional space” or “We estimate the dimensionality of roll-call voting to be 3” or even “We estimate the dimensionality to be 3 +/- 2”, I prefer a statement like, “We can explain 90% of the variance with three dimensions” or “We can correctly predict 93% of the votes using a three-dimensional model” or whatever.
Andrew,
I agree with you in the context of social sciences mainly. In the harder science framework, for instance fluid mechanics, obtaining a bound on the dimension of the data manifold would aim at reducing the number of experiments needed to have a good understanding of the phenomena at hand. Furthermore, physical phenomena generally have many different scales so a dimensionality estimate is still a reflection of the phenomena studied but also that of the particularities of the sensor. One is lucky when the dimension has some kind of cut-off. There is also another aspect to knowing the dimension of a manifold: when it changes. In physics or even social sciences, people are generally interested in changes such as phase changes or critical phenomena. Taking your example, it might be interesting to find out that a roll call on a bill was 2 and a similar bill had a roll call dimension of 7 five years later.
Igor.
In a the analogy to regression, random projections strike me as random choices of a small subset of predictors. Yes, of course, there is a lot of information around, and with sufficiently many subsets you can get pretty good (theoretical) predictive accuracy. But, on the other hand, is such activity really intuitively sensible?
I agree with the concern Aleks brought up. Some of it may make sense but many may not.
Paul
Aleks,
You said :
"..random projections strike me as random choices of a small subset of predictors…"
I am not sure if what I am going to say will make sense, but let me try:
As far as I can tell, it is not a random choice of a few items that eventually yield the full information. Rather, in most cases, it is the random combination of all the items that yields the compressed measurements. In some cases, people have been trying to take a few random measurements (using what they call a sparse measurement matrix) but they have to add another step called group testing in order to obtain the full information back eventually.
( Compressed Sensing Reconstruction via Belief Propagation,
http://www.dsp.ece.rice.edu/cs/csbpTR07142006.pdf )
Igor.
Igor, thanks for pointing out the limitations of my analogy. It's true that making random decisions is often better than repeating a deterministic decision: this is the basis for randomized algorithms and many other useful constructs. However, adaptive selection of decisions that combines randomness and information is quite often better than pure randomness. In the context of dimension reduction, there are better methods (projection pursuit, PCA, MDS) than just taking random linear projections.