Question 12 of our Applied Regression final exam (and solution to question 11)

Here’s question 12 of our exam:

12. In the regression above, suppose you replaced height in inches by height in centimeters. What would then be the intercept and slope of the regression? (One inch is 2.54 centimeters.)

And the solution to question 11:

11. We defined a new variable based on weight (in pounds):

heavy <- weight>200

and then ran a logistic regression, predicting “heavy” from height (in inches):

glm(formula = heavy ~ height, family = binomial(link = "logit"))
            coef.est coef.se
(Intercept) -21.51     1.60
height        0.28     0.02
---
  n = 1984, k = 2

(a) Graph the logistic regression curve (the probability that someone is heavy) over the approximate range of the data. Be clear where the line goes through the 50% probability point.

(b) Fill in the blank: near the 50% point, comparing two people who differ by one inch in height, you’ll expect a difference of ____ in the probability of being heavy.

(a) The x-axis should range from approximately 60 to 80 (most people have heights between 60 and 80 inches), and the y-axis should range from 0 to 1. The easiest way to draw the logistic regression curve is to first figure out where it goes through 0.5. That’s when the linear predictor equals 0, thus -21.51 + 0.28*x = 0, so x = 21.58/0.28 = 79. Then at that point the line has slope 0.07 (remember the divide-by-4 rule), and that will be enough to get something pretty close to the fitted curve.

(b) As just noted, the divide-by-4 rule gives us an answer of 0.07.

Common mistakes

In making the graphs, most of the students didn’t think about the range of x, for example having x go from 0 to 100, which doesn’t make sense as there aren’t any people close to 0 or 100 inches tall. To demand that the range of the curve fit the range of the data is not just being picky: it changes the entire interpretation of the fitted model because changing the range of x also changes the range of probabilities on the y-axis.

3 thoughts on “Question 12 of our Applied Regression final exam (and solution to question 11)

  1. Re: “[M]ost of the students didn’t think about the range of x”

    As long as most of the density is concentrated between 60-80, does the range of x really matter? That is, if I draw a line with slope 0.07 which passes through (79, 0.5) which quickly climbs (drops) to 1 (0), the answer should still be valid, right?

    (I’ll be honest, I would have set the range of x as [67, 91] or +- one foot. I am bad at estimating :| )

Leave a Reply to Pranav Cancel reply

Your email address will not be published. Required fields are marked *