Here’s question 5 of our exam:

5. You have just graded an exam with 28 questions and 15 students. You fit a logistic item-response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true?

(a) If a question is answered correctly by students with low ability, but is missed by students with high ability, then its discrimination parameter will be near zero.

(b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset.

Briefly explain your answer in one to two sentences.

And the solution to question 4:

4. A researcher is imputing missing responses for income in a social survey of American households, using for the imputation a regression model given demographic variables. Which of the following two statements is basically true?

(a) If you impute income deterministically using a fitted regression model (that is, imputing using Xβ rather than Xβ + ε), you will tend to impute too many people as rich or poor: A deterministic procedure overstates your certainty, making you more likely to impute extreme values.

(b) If you impute income deterministically using a fitted regression model (that is, imputing using Xβ rather than Xβ + ε), you will tend to impute too many people as middle class: By not using the error term, you’ll impute too many values in the middle of the distribution.

Option (a) is wrong and option (b) is right. We discuss this in the missing-data chapter of the book. The point prediction from a regression model gives you something in the middle of the distribution. You need to add noise in order to approximate the correct spread.

**Common mistakes**

Almost all the students got this one correct.

> “If you impute income deterministically using a fitted regression model (that is, imputing using Xβ rather than Xβ + ε), you will tend to impute too many people as middle class: By not using the error term, you’ll impute too many values in the middle of the distribution.”

It seems that this would also apply to why forecasts for baseball team wins show less variability than the actual end of season wins. It is fairly unlikely that any given team would have a forecast of over 100 wins, yet it is not that uncommon for some team (or teams) to actually win 100 games. A positive error term (good luck) will push some good team beyond their predicted number of wins, while a negative error term (bad luck) will push some bad teams below their prediction.