## Question 15 of my final exam for Design and Analysis of Sample Surveys

15. A researcher conducts a random-digit-dial survey of individuals and married couples. The design is as follows: if only one person lives in a household, he or she is interviewed. If there are multiple adults in the household, one is selected at random: he or she is interviewed and, if he or she is married to one of the other adults in the household, the spouse is interviewed as well. Come up with a scheme for inverse-probability weights (ignoring nonresponse and assuming there is exactly one phone line per household).

Solution to question 14

From yesterday:

14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.)

(a) If done reasonably well, imputation is preferred to available-case and complete-case analysis.

(b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ.

(c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds information is good when imputing.

(d) It is probably not a good idea to include current health status variables as predictors in a model imputing past activities: current health is possibly influenced by past activities, and including a casual outcome can bias estimates of a treatment variable.

(e) If you fit a regression model and impute your best prediction for each person (rather than imputing random draws from the predictive distribution), you can have problems because you will be more likely to impute extreme values.

(f) It is a good idea to fit a logistic regression predicting response/nonresponse to the question of interest as a way to look for systematic differences between respondents and nonrespondents on this question.

Solution: a, b, c, f. Not d (for imputation you want a predictive model not a causal model) and not e (if you impute the best prediction for each case you will understate the variation and be less, not more, likely to impute extreme values).

1. zbicyclist says:

15 seems almost irrelevant given the continual decline in response to telephone surveys. See, for example, the Pew time series showing telephone response rate declining from 36% in 1997 to 9% currently.

The notion of one phone line per household seems quaint.

The notion that the spouse will also be cooperative is optimistic.

But, ignoring this:

Let p = probability of selecting a phone line.
1 person household = p
2 person household, married = p
2 person household, unmarried = p/2
3 person household, unmarried = p/3
3 person household with 2 married
married couple 2p/3
unmarried person p/3

Given p, inverse p weighting is obvious.

2. […] yesterday: 15. A researcher conducts a random-digit-dial survey of individuals and married couples. The […]