Skip to content
 

Question 10 of our Applied Regression final exam (and solution to question 9)

Here’s question 10 of our exam:

10. For the above example, we then created indicator variables, age18_29, age30_44, age45_64, and age65up, for four age categories. We then fit a new regression:

lm(formula = weight ~ age30_44 + age45_64 + age65up)
             coef.est coef.se
(Intercept)     157.2     5.4
age30_44TRUE     19.1     7.0
age45_64TRUE     27.2     7.6
age65upTRUE       8.5     8.7
  n = 2009, k = 4
  residual sd = 119.4, R-Squared = 0.01

Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis) and draw the fitted regression model. Again, this graph should be consistent with the above computer output.

And the solution to question 9:

9. We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults. We created a new variables, age10 = age/10. We then fit a regression:

lm(formula = weight ~ age10)
            coef.est coef.se
(Intercept)    161.0     7.3
age10            2.6     1.6
  n = 2009, k = 2
  residual sd = 119.7, R-Squared = 0.00

Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis). Label the axes appropriately, draw the fitted regression line, and make a scatterplot of a bunch of points consistent with the information given and with ages ranging roughly uniformly between 18 and 90.

The x-axis should go from 18 to 90, or from 0 to 90 and the y-axis should go from approximately 100 to 300, or from 0 to 300. It’s easy enough to draw the regression line, as the intercept and slope are right there. The scatterplot should have enough vertical spread to be consistent with a residual sd of 120. Recall that approximately 2/3 of the points should fall between +/- 1 sd of the regression line in vertical distance.

Common mistakes

Everyone could draw the regression line; nearly nobody could draw a good scatterplot. Typical scatterplots were very tightly clustered around the regression line, not at all consistent with a residual sd of 120 and an R-squared of essentially zero.

I guess we should have more assignments where students draw scatterplots and sketch possible data.

9 Comments

  1. Brent Hutto says:

    Alternatively, I suppose you could multiple-choice it and show three or four little thumbnail scatterplots, ask them which one looks consistent with a residual SD of 120 and very small R-squared.

  2. Jeff Walker says:

    I like this question.

  3. Tom says:

    How do you mark a question like this? It seems quite subjective in nature – for example, is a bunch of points 5 or 50? Drawing 5 points that were close-ish to the regression line may well be consistent with the regression criteria but you would expect to see the variation clearly with 50 points.

    • Andrew says:

      Tom:

      1. From the exam problem: “We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults.” You wouldn’t be downloading data on a random sample of 5 people!

      2. We’d sketched many such scatterplots in class during the semester so the students had an idea of what they were supposed to do.

  4. Emmanuel Charpentier says:

    Aïeeeee : I have to take exception with this :

    > Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis). Label the axes appropriately, draw the fitted regression line, and make a scatterplot of a bunch of points consistent with the information given and with ages ranging roughly uniformly between 18 and 90.

    Being a terrible sketcher (and that’s the understatement of the year…), I’d never been able to pass this one. Offering an alternative to propose a small code snippet to produce the required graph would have been a good way to spot who reminded tjhe salient points of the graph…

  5. Kaiser says:

    I’m missing something here… with average weight 166 at age 18 and 184 at age 90, and residual sd of 120, how do you avoid generating negative weights?

  6. Kaiser says:

    You’re right, the basic regression model does not require normality. What distribution would you recommend using here? I thought the answer requires simulating from weight = a + b*age10 + error.

Leave a Reply