Haynes Goddard writes:
I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot.
I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data.
I don’t find the topic on your blog, and wonder if you have addressed the issue.
Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once.
To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not enough data to estimate their coefficients in any meaningful way. This sort of problem comes up all the time, for example here’s an example from my research, a meta-analysis of the effects of incentives in sample surveys.
The trouble with stepwise regression is that, at any given step, the model is fit using unconstrained least squares. I prefer methods such as factor analysis or lasso that group or constrain the coefficient estimates in some way.