The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write:
Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out:
Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions are a disaster—the predicted values quickly go toward the mean and can’t even attempt to track the curve.)
As Reilly and Zeringue note, the above graph shows potential room for improvement in the model, but even as is, it shows the huge benefits that can be obtained by attempting to model the underlying process rather than simply fitting the data using a conventional family of models.
(It’s funny for me to emphasize this point, given how often I use conventional models such as linear and logistic regression.)
P.S. The title and text above have been modified to reflect comments below with reference to models fit to the lynx data in the ecology literature. There appears to be not enough communication between ecologists and statisticians. The statistical point above still holds—a simple model with some reasonable structure can outperform a generic data-fitting model such as an autoregression—but you should probably check out some of the references given in the comments if you’re interested in the lynx example or ecology models more generally.