Question 11 of our Applied Regression final exam (and solution to question 10)

Posted on June 11, 2019 9:56 AM by Andrew

Here’s question 11 of our exam:

11. We defined a new variable based on weight (in pounds):
heavy <- weight>200
and then ran a logistic regression, predicting “heavy” from height (in inches):
glm(formula = heavy ~ height, family = binomial(link = "logit"))
            coef.est coef.se
(Intercept) -21.51     1.60
height        0.28     0.02
---
  n = 1984, k = 2
(a) Graph the logistic regression curve (the probability that someone is heavy) over the approximate range of the data. Be clear where the line goes through the 50% probability point.

(b) Fill in the blank: near the 50% point, comparing two people who differ by one inch in height, you’ll expect a difference of ____ in the probability of being heavy.

And the solution to question 10:

10. For the above example, we then created indicator variables, age18_29, age30_44, age45_64, and age65up, for four age categories. We then fit a new regression:
lm(formula = weight ~ age30_44 + age45_64 + age65up)
             coef.est coef.se
(Intercept)     157.2     5.4
age30_44TRUE     19.1     7.0
age45_64TRUE     27.2     7.6
age65upTRUE       8.5     8.7
  n = 2009, k = 4
  residual sd = 119.4, R-Squared = 0.01
Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis) and draw the fitted regression model. Again, this graph should be consistent with the above computer output.

The graph of weight vs. age should be identical to that in the previous problem. Fitting a new model does not change the data. The fitted regression model is four horizontal lines: a line from x=18 to x=30 at the level 157.2, a line from x=30 to x=45 at the level 157.2 + 19.1, a line from x=45 to x=65 at the level 157.2 + 27.2, and a line from x=65 to x=90 at the level 157.2 + 8.5.

Common mistakes

The biggest mistake was discretizing x. Another common mistake was to draw the regression line and then draw the dots relative to the line, so that there were jumps in the underlying data at the cutpoints in the model. Also, as in the previous problem, when students drew the dots, they almost always didn’t include enough vertical spread, and their scatterplots looked nothing like what the data might look like. Again, when teaching, I need to clarify the distinction between the model and the data.

Statistical Modeling, Causal Inference, and Social Science

Question 11 of our Applied Regression final exam (and solution to question 10)

Leave a Reply Cancel reply