Reading somebody else’s statistics rant made me realize the inherent contradictions in much of my own statistical advice.

Jeff Lax sent along this article by Philip Schrodt, along with the cryptic comment:

Perhaps of interest to you. perhaps not. Not meant to be an excuse for you to rant against hypothesis testing again.

In his article, Schrodt makes a reasonable and entertaining argument against the overfitting of data and the overuse of linear models. He states that his article is motivated by the quantitative papers he has been sent to review for journals or conferences, and he explicitly excludes “studies of United States voting behavior,” so at least I think Mister P is off the hook.

I notice a bit of incoherence in Schrodt’s position–on one hand, he criticizes “kitchen-sink models” for overfitting and he criticizes “using complex methods without understanding the underlying assumptions” . . . but then later on he suggests that political scientists in this country start using mysterious (to me) methods such as correspondence analysis, support vector machines, neural networks, Fourier analysis, hidden Markov models, topological clustering algorithms, and something called CHAID! Not to burst anyone’s bubble here, but if you really think that multiple regression involves assumptions that are too much for the average political scientist, what do you think is going to happen with topological clustering algorithms, neural networks, and the rest??

As in many rants of this sort (my own not excepted),** there is an inherent tension between two feelings**:

1. The despair that people are using methods that are too simple for the phenomena they are trying to understand.

2. The near-certain feeling that many people are using models too complicated for them to understand.

Put 1 and 2 together and you get a mess. On one hand, I find myself telling people to go simple, simple, simple. When someone gives me their regression coefficient I ask for the average, when someone gives me the average I ask for a scatterplot, when someone gives me a scatterplot I ask them to carefully describe one data point, please.

On the other hand, I’m always getting on people’s case about too-simple assumptions, for example analyzing state-level election results over a 50 year period and thinking that controlling for “state dummies” solves all their problems. As if Vermont in 1952 is the same as Vermont in 2002.

When it comes to specifics, I think my advice is ok. For example, suppose I suggest that someone, instead of pooling 50 years of data, instead do a separate analysis of each year or each decade and then plot their estimates over time. This recommendation of the secret weapon actually satisfied both criteria 1 and 2 above: the model varies by year (or by decade) and is thus more flexible than the “year dummies” model that preceded it; but at the same time the new model is simpler and cleaner.

Still and all, there’s a bit of incoherence in telling people to go more sophisticated and simpler at the same time, and I think people who have worked with me have seen me oscillate in my advice, first suggesting very basic methods and then pulling out models that are too complicated to fit. As I like to say, I always want to use the simplest possible method that’s appropriate for any problem, but said method always ends up being something beyond my ability to compute or even, often, to formulate.

To return to Schrodt’s article, I have a couple of minor technical disagreements. He writes that learning hierarchical models “doesn’t give you ANOVA.” Hold that thought! Check out chapter 22 of ARM and my article in the Annals of Statistics. And I think Schrodt is mixing apples and oranges by throwing in computational methods (“genetic algorithms and simulated annealing methods”) in his list of models. Genetic algorithms and simulated annealing methods can be used for optimization and other computational tasks but they’re not models in the statistical (or political science) sense of the word.

Finally, near the end of his paper Schrodt is looking for a sensible Bayesian philosophy of science. I suggest he look here, for a start.

P.S. A special Follower-of-the-Blog award to the first person who recognizes the blog entry that the above title is echoing.

Maybe political scientists should use cross-validation etc. to balance bias and variance? I also wonder if part of the problem here is Schrodt being unhappy with kitchen-sink models as tools for causal inference, not prediction per se. (Also, SVMs seem like a weird thing to recommend to political scientists…)

made me realize the inherent contradictions in much of my own statistical advice.

Gee, I thought the whole point of Bayesian inference

was to always be coherent (leaving aside empirical

Bayes). Well, sometimes a cigar is just a cigar.

Numeric:

I never claimed to be coherent. See chapter 1 of Bayesian Data Analysis, starting with the very first page of the chapter (on the three steps of Bayesian data analysis: model building, inference, and model checking).

Since coherence is out the window, perhaps, as a disinterested observer, you would care to weigh in

on the Templeton (Coherent and incoherent inference in phylogeography and human evolution, PNAS, 2010 April 6; 107(14): 6376–6381). You might consider Robert's comments (arxiv.org/pdf/1006.3854) if you wish to

address this.

Thanks for the link, numeric. We published our letter to PNAS about the incoherence in Templeton's tribune, followed by an reply of the author where he teaches us the basics of probability! I think this debate has gone beyond incoherence, Templeton replying to any criticism with the same a-mathematical arguments that the mathematical framework for Bayesian tests is mathematically wrong.

The paper was a good fun read, though I tend to roll my eyes at the philosophy of science stuff.

Methodologically, most of the paper can be distilled to "If you have important prior information, use it and put it into a well-thought out model. If you don't, then use simple robust methods to get a rough idea and hopefully inspire a well-thought out model. The result is basically equivalent to what you'd get with flat-priors anyway".

Central to this though, is the need to think carefully about the model and test it's assumptions. Since the model assumptions are false, it's also important to access robustness of inferences to deviations from model assumptions.

correspondence analysis – "is just" a concise (low dimensional) plot of (log-linear model) residuals.

K?

What if we take to heart Box's quote "All models are wrong, but some models are useful"? Even the simplest model can be useful in some situations. Of course, that begs the question of WHEN a very simple model is useful enough to make up for its wrongness.

I think we should also bear in mind the arguments raised in Abelson's "Statistics as principled argument". That is, statistics is part of a principled argument making a case for a particular opinion or theory. (I highly recommend Abelson's book).