Question 15 of our Applied Regression final exam (and solution to question 14)

Here’s question 15 of our exam:

15. Consider the following procedure.

• Set n = 100 and draw n continuous values x_i uniformly distributed between 0 and 10. Then simulate data from the model y_i = a + bx_i + error_i, for i = 1,…,n, with a = 2, b = 3, and independent errors from a normal distribution.

• Regress y on x. Look at the median and mad sd of b. Check to see if the interval formed by the median ± 2 mad sd includes the true value, b = 3.

• Repeat the above two steps 1000 times.

(a) True or false: You would expect the interval to contain the true value approximately 950 times. Explain your answer (in one sentence).

(b) Same as above, except the error distribution is bimodal, not normal. True or false: You would expect the interval to contain the true value approximately 950 times. Explain your answer (in one sentence).

And the solution to question 14:

14. You are predicting whether a student passes a class given pre-test score. The fitted model is, Pr(Pass) = logit^−1(a_j + 0.1x),
for a student in classroom j whose pre-test score is x. The pre-test scores range from 0 to 50. The a_j’s are estimated to have a normal distribution with mean 1 and standard deviation 2.

(a) Draw the fitted curve Pr(Pass) given x, for students in an average classroom.

(b) Draw the fitted curve for students in a classroom at the 25th and the 75th percentile of classrooms.

(a) For an average classroom, the curve is invlogit(1 + 0.1x), so it goes through the 50% point at x = -10. So the easiest way to draw the curve is to extend it outside the range of the data. But in the graph, the x-axis should go from 0 to 50. Recall that invlogit(5) = 0.99, so the probability of passing reaches 99% when x reaches 40. From all this information, you can draw the curve.

(b) The 25th and 75th percentage points of the normal distribution are at the mean +/- 0.67 standard errors. Thus, the 25th and 75th percentage points of the intercepts are 1 +/- 0.67*2, or -0.34, 2.34, so the curves to draw are invlogit(-0.34 + 0.1x) and invlogit(2.34 + 0.1x). These are just shifted versions of the curve from a, shifting by 1.34/0.1 = 13.4 to the left and the right.

Common mistakes

Students didn’t always use the range of x. The most common bad answer was to just draw a logistic curve and then put some numbers on the axes.

A key lesson that I had not conveyed well in class: draw and label the axes first, then draw the curve.

Leave a Reply

Your email address will not be published. Required fields are marked *