We need better default plots for regression.

Robin Lee writes:

To check for linearity and homoscedasticity, we are taught to plot residuals against y fitted value in many statistics classes.

However, plotting residuals against y fitted value has always been a confusing practice that I know that I should use but can’t quite explain why.
It is not until this week I understood the concept of “Y fitted values as a function of all explanatory variables”

Since homoscedasticity means that the conditional variance of the error term is constant and does not vary as a function of the explanatory variable, it seems logical that we compare residuals (error term) with x (explanatory).

Why don’t we plot for residuals against x observed values? Or why don’t we teach students to plot residuals against x variable?

One reason I can think of to explain this practice is that while it is possible to plot residuals against x observed in a one variate regression model, it becomes tricky when there are two or more X variables. It might require multiple scatter plots, one per X variable.
Since Y fitted values is a function of all explanatory variables, it becomes convenient to plot one plot

Another reason is that this practice is reinforced by the tool availability.
plot(a lm model) in R returns four plots for model diagnostics. The first of which is plotting residuals against y fitted value.
To plot residuals against each of the X variables, it requires users to know how to extract residuals from lm model object.

Any other reasons from statistical, communication or teaching angle to explain the conventional practice of residuals against y fitted value?

My reply: Yes, we do recommend plotting residuals vs. individual x variables. Jennifer and I give an example in the logistic regression chapter of our book. But I guess this is not so well known, given that you asked this question! Hence this post.

My other reason for posting this is to echo Robin’s point about the importance of defaults. We need better default plots for regression so that practitioners will do a better job learning about their data. Otherwise we’re looking at endless vistas of useless q-q plots.

27 thoughts on “We need better default plots for regression.

  1. Well, I suppose different people are taught differently. I had only ever been taught to plot residuals against x, and had never even heard of plotting residuals against fitted y.

    • They might have taught me to plot against fitted y, but I don’t think I understood at the time and just always plotted against individual x’s.

  2. One potential benefit of plotting residuals vs fitted value is that it can suggest if a transformation of the response is appropriate. This can be hard to see if you are plotting against individual explanatory variables.

  3. > It is not until this week I understood the concept of “Y fitted values as a function of all explanatory variables”

    Before diving into conditional variance, diagnostics plots of residuals, etc. didn’t they taught you to plot the fitted values (and the original y values) against the explanatory variable for the single-variable case?

    I think the teaching failure here, if any, may be not explaining clearly that the regression line becomes a regression plane when we have two explanatory variables and becomes a N-dimensional hyper-plane in the general case (the latter is a bit more difficult to visualize, though).

    • > regression line becomes a regression plane

      The situation may be different now, but Kruschke’s Bayesian Data Analysis textbook is the only introductory book I can recall that actually depicts regression planes, as well as the more complicated surfaces that arise when interaction terms are included. I think these plots are very evocative and useful for conceptualizing what is going on in multiple regression.

      These visualizations might also be useful for disabusing people of the idea that a significant (or credible) regression coefficient corresponds to a direct causal link from that specific predictor to outcome, since the situation can be quite a bit more complicated in the context of other variables (even without interaction terms; I believe Andrew has some examples of this on this blog).

      • I don’t know, maybe it is a field-specific thing or maybe introductory texts are just getting worse over time. This is from the first edition (1969) of Wonnacott & Wonnacott’s “Introductory Statistics”: https://imgur.com/a/H5wFLm7

        That’s the second page of the chapter on multiple regression and there are similar charts when collinearity and dummy variables are discussed. It was a popular textbook in economics and business school, maybe not anymore (it seems that the most recent edition is from 1990).

        • The following review of the book Amazon is sadly relevant:

          “I really enjoyed this book (mine is the 5th edition) and for one simple reason – it was written before the age of the PC and, I say with a shudder, Excel.

          “With these earlier books you really had to know the maths and this book teaches the maths but in a very intuitive way. The authors get you to THINK about the problem before introducing the maths. This means the formulae when presented is actually logical and thereby understandable. It also has the benefit of helping you learn how to think about problems and how to frame them in a mathematical/statistical way.

          “More recent stats books assume you are using software and are thereby less rigorous mathematically. Its more “here’s a problem and here’s the menu item you use to solve it”. This is a shame and the outcome is that though students are great software technicians they can be poor statisticians. Put a pen and paper in front of them and they’re clueless. In comparison this book says “here’s a problem, lets think about it a while.”

          “If you are interested in statistics always go for the books published before the PC age and this is an excellent one to start with. That said I’m only giving it 4 stars as they have left out some pretty introductory topics, such as a discussion on the Poisson distribution, though the authors do state their reasons for these omissions in the preface.”

        • Cool, thanks for sharing!

          The reviewer echoes my own concern with the way stats is often approached these days, and even worse “data science” (a name I still don’t understand, but that’s a whole other kettle of fish).

          Ironically, the power of computers to free us from tedious arithmetic *should* allow us to teach the thinking behind the math more clearly, rather than less. At least that’s what I try, in my fumbling way, to do (though I still subject students to plenty of tedious math so they can appreciate what the computer is doing for them).

        • Let me offer a contrary view. I don’t think the computer has ruined statistics education they way you seem to be suggesting – although for some people and some instruction that is certainly true. I personally use JMP and teach that way and find that it facilitates looking at your data at all times and thinking about what it means. Yes, it allows you to use the mouse to easily do things – some of which may stand in the way of thinking behind the math. But it need not do that, and it is the job of the instructor to steer people away from misusing the tools and towards using them intelligently. As for many current books, they are bad for instruction – but not all. The fact that some classics are better than many current texts, I don’t deny – I agree. But there are some exceptions. And, for many students who struggle with math, they are still able to think about and look at data because of computing technology.

        • I think we agree! I was only remarking on the frustrations I’ve had with many modern stats texts, especially those geared toward non-mathy, applied settings like Psychology. They seem to assume that students will only accept doing things that are “easy” and as such the books make little effort to get them engaged and thinking about what statistics/data really mean and instead treat it as a mechanical exercise. As an instructor, I actually do use computers quite a bit, and I agree that when properly used they are a great aid to thinking, rather than a substitute for it.

        • Dale said, “it is the job of the instructor to steer people away from misusing the tools and towards using them intelligently.”

          Yes! And I’d add that it is also the job of the textbook writers.

        • gec says:

          “They seem to assume that students will only accept doing things that are “easy”… ”

          Yeah, but sadly amazing how many instructors like that too. That way they don’t have to bother with the actual thinking part, they just have to show students how to plug in the numbers. it’s a lot easier to grade that way too, you can grade on simple things.

        • One can include visualizations that are even better than those in that text. They’re dynamic and show twisting planes for interactions and such. And through such visualizations, one can provide an understanding to those who are not math inclined using computers. The fact that some instructors don’t do so isn’t necessarily a consequence of the age of software. It has been true in every age that there are good instructors and not so good.

        • +1 (In particular, teaching goes best when the instructor and the textbook are trying to teach the same thing. Unfortunately, it is often hard to find a textbook that teaches the right things.)

    • …and more generally, that all of these linear models (any polynomial for that matter) is best thought of as a Taylor Series approximation of something richer and truer in the neighborhood of the space in which data were collected.

      My other concern with the default plot recommendations is that they often reinforce that the data are “supposed to look like the likelihood” or something like that. Even the residuals don’t really *need* to look like the likelihood – that’s only really important for NHST.

  4. My understanding (and please you all, correct me if I’m mistaken) that with Bayesian modeling and the practice of posterior predictive checks (or predictive checks more generally) there is no need for all this mess over plotting residuals or not. To me, my workflow is as follow: I make assumptions in my model (say, y ~ N( mu, sigma2), mu= a + bx and add some priors). Then I see if yhat is behaving according the data. This will make me check if some of my assumptions are wrong. Maybe I need to model sigma2. Maybe I need to use other distribution other than normal. Maybe I have more prior info that should be included etc.
    In this way, there isn’t a ton of statistical tests that I need to perform. It’s just iterating over predictive checks, until I’m satisfied with the results of my assumptions and model.

  5. I prefer plotting residuals against fitted values. However, it would be great if the plot defaults could add in the residuals vs. fitted to the null model. Then it would be easy to assess the impact of fit. I usually get my students to do that and it adds tremendously to the ability to understand the residuals v. fitted plot from the defaults.

    Just in case the questionner is reading, For cases where it would be the most appropriate to plot residuals against X, simple regressions, the plot against the fitted values looks exactly the same. So, for simple regression it really make no difference. When you have multiple regression it’s just multiple ways of looking at the data.

  6. A couple of years ago I was working with some air quality scientists and their standard regression diagnostic was to fit another linear regression of response aginst fitted and visually examine the fit together with the coefficient of determination. It appears that the reason they did this was because Excel was their “default” statisical software package. So yes default options do matter.

    Regarding teaching and model visualisation, I would put in a plug for the CAST e-books. These ebooks (which are Statistics textbooks together with interactive data visulations and videos in a single interface) allow students to dynamically alter plots and see the results of different datasets and assumptions on data, models and diagnostic plots. And they’re free.

    https://cast.idems.international/

    Disclaimer: these ebooks were created by my friend and former colleague Doug Stirling, and I have used them extensively in my teaching.

  7. Can’t we use the Breush Pagan test to see if we have heteroscedasticity? In Econometrics, we are taught to run the BP test, yet I’ve seen practically zero research papers doing this (or rather publishing their results of the BP test).

      • A commonly given reason for the importance of equal variance is that heteroscedasticity implies the sd estimates are unreliable: is that not a concern?

        A related question, if it is in fact important to check for equal variance to interpret the standard errors for the independent variable of interest:

        (2) When interpreting the average estimate and sd for the independent variable of interest in a mulitlevel model (not within a single group), is the equal variance assumption important only “overall”/ or also within each of the groups/levels?

        Many thanks to anyone with advice or references/suggestions on where to read more!

  8. We musn’t lose track of the point that regression packages should automatically provide all the output that the user *ought* to want, even if he doesn’t know he ought to want a particular outcome.

    Also important is that the packages should *not* provide info that is usually not useful– for example, an F-test for the hypothesis that the X-variables in aggregate are not significantly different from zero.

    In the case of heterosked residual-x plots, with modern fast computers, this should be a default, put into a single file that does not display unless the user asks for it.

    Since that is NOT the case, can someone out there give us code for Stata, Python, and R that we can add to every regression we run and comment out if we don’t want to use up CPU running it? You could put it on Github and let Prof. Gelman know about it so he could spread the word. I’m not good enough at programming to do it myself, but it would be a public service, and it woudl be appropriate to prominently include your name in the code, so it would be in your self-interest. (When as an assistant professor I was writing my book, Games and Information, my senior colleague Steve Lippmann told me, “It won’t help you get tenure, but it will help you get another job.”)

Leave a Reply to Aaron Pallas Cancel reply

Your email address will not be published. Required fields are marked *