Question 10 of our Applied Regression final exam (and solution to question 9)

Here’s question 10 of our exam:

10. For the above example, we then created indicator variables, age18_29, age30_44, age45_64, and age65up, for four age categories. We then fit a new regression:

lm(formula = weight ~ age30_44 + age45_64 + age65up)
             coef.est coef.se
(Intercept)     157.2     5.4
age30_44TRUE     19.1     7.0
age45_64TRUE     27.2     7.6
age65upTRUE       8.5     8.7
  n = 2009, k = 4
  residual sd = 119.4, R-Squared = 0.01

Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis) and draw the fitted regression model. Again, this graph should be consistent with the above computer output.

And the solution to question 9:

9. We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults. We created a new variables, age10 = age/10. We then fit a regression:

lm(formula = weight ~ age10)
            coef.est coef.se
(Intercept)    161.0     7.3
age10            2.6     1.6
  n = 2009, k = 2
  residual sd = 119.7, R-Squared = 0.00

Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis). Label the axes appropriately, draw the fitted regression line, and make a scatterplot of a bunch of points consistent with the information given and with ages ranging roughly uniformly between 18 and 90.

The x-axis should go from 18 to 90, or from 0 to 90 and the y-axis should go from approximately 100 to 300, or from 0 to 300. It’s easy enough to draw the regression line, as the intercept and slope are right there. The scatterplot should have enough vertical spread to be consistent with a residual sd of 120. Recall that approximately 2/3 of the points should fall between +/- 1 sd of the regression line in vertical distance.

Common mistakes

Everyone could draw the regression line; nearly nobody could draw a good scatterplot. Typical scatterplots were very tightly clustered around the regression line, not at all consistent with a residual sd of 120 and an R-squared of essentially zero.

I guess we should have more assignments where students draw scatterplots and sketch possible data.

9 thoughts on “Question 10 of our Applied Regression final exam (and solution to question 9)

  1. Alternatively, I suppose you could multiple-choice it and show three or four little thumbnail scatterplots, ask them which one looks consistent with a residual SD of 120 and very small R-squared.

  2. How do you mark a question like this? It seems quite subjective in nature – for example, is a bunch of points 5 or 50? Drawing 5 points that were close-ish to the regression line may well be consistent with the regression criteria but you would expect to see the variation clearly with 50 points.

    • Tom:

      1. From the exam problem: “We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults.” You wouldn’t be downloading data on a random sample of 5 people!

      2. We’d sketched many such scatterplots in class during the semester so the students had an idea of what they were supposed to do.

  3. Aïeeeee : I have to take exception with this :

    > Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis). Label the axes appropriately, draw the fitted regression line, and make a scatterplot of a bunch of points consistent with the information given and with ages ranging roughly uniformly between 18 and 90.

    Being a terrible sketcher (and that’s the understatement of the year…), I’d never been able to pass this one. Offering an alternative to propose a small code snippet to produce the required graph would have been a good way to spot who reminded tjhe salient points of the graph…

  4. I’m missing something here… with average weight 166 at age 18 and 184 at age 90, and residual sd of 120, how do you avoid generating negative weights?

  5. You’re right, the basic regression model does not require normality. What distribution would you recommend using here? I thought the answer requires simulating from weight = a + b*age10 + error.

Leave a Reply to Jeff Walker Cancel reply

Your email address will not be published. Required fields are marked *