I’m involved (with Irv Garfinkel and others) in a planned survey of New York City residents. It’s hard to reach people in the city–not everyone will answer their mail or phone, and you can’t send an interviewer door-to-door in a locked apartment building. (I think it violates IRB to have a plan of pushing all the buzzers by the entrance and hoping someone will let you in.) So the plan is to use multiple modes, including phone, in person household, random street intercepts and mail.

The question then is how to combine these samples. My suggested approach is to divide the population into poststrata based on various factors (age, ethnicity, family type, housing type, etc), then to pool responses within each poststratum, then to runs some regressions including postratsta and also indicators for mode, to understand how respondents from different modes differ, after controlling for the demographic/geographic adjustments.

Maybe this has already been done and written up somewhere?

P.S. As you try to do this sort of thing more carefully you run up against the sorts of issues discussed in the Struggles paper. So this is definitely statistical research, not merely an easy application of existing methods.

P.P.S. Cyrus has some comments, which for convenience I’ll repost here:

It’s interesting to consider this problem by combining a “finite population” perspective with some ideas about “principal strata” from the causal inference literature. Suppose a finite population U from which we draw a sample of N units. We have two modes of contact, A and B. Suppose for the moment that each unit can be characterized by one of the following response types (these are the “principal strata”):

Type Mode A response Mode B response I 1 1 II 1 0 III 0 1 IV 0 0

Then, there are two cases to consider, depending on whether mode of contact affects response:

Mode of contact does not affect response

This might be a valid assumption if the questions of interest are not subject to social desirability biases, interviewer effects, etc. In this case, it is easy to define a target parameter as the average response in the population. You could proceed efficiently by first applying mode A to the sample, and then applying mode B to those who did not respond with mode A. At the end, you would have outcomes for types I, II, and III units, and you’d have an estimate of the rate of type IV units in the population. You could content yourself with an estimate for the average response on the type I, II, and III subpopulation. If you wanted to recover an estimate of the average response for the full population (including type IV’s), you would effectively have to impute values for type IV respondents. This could be done by using auxiliary information either to genuinely impute or (in a manner that is pretty much equivalent) to determine which type I, II, or III units resemble the missing type IV units, and up-weight. In any case, if the response of interest has finite support, one could also compute “worst case” (Manski-type) bounds on the average response by imputing maximum and minimum values to type IV units.

Mode of contact affects response

This might be relevant if, for example, the modes of contact are phone call versus face-to-face interview, and outcomes being measured vary depending on whether the respondent feels more or less exposed in the interview situation. This possibility makes things a lot trickier. In this case, each unit is characterized by a response under mode A and another under mode B (that is, two potential outcomes). One immediately faces a quandary of defining the target parameter. Is it the average of responses under the two modes of contact? Maybe it is some “latent” response that is imperfectly revealed under the two modes of contact? If so, how can we characterize this “imperfection”? Furthermore, only for type I individuals will you be able to obtain information on both potential responses. Does it make sense to restrict ourselves to this subpopulation? If not, then we would again face the need for imputation. A design that applied both mode A and mode B to the complete sample would mechanically reveal the proportion of type I units in the population, and by implication would identify the proportion of type II, III, and IV units. For type II units we could use mode A responses to improve imputations for mode B responses, and vice versa for type III respondents. Type IV respondents’ contributions to our estimate of the “average response” would be based purely on auxiliary information. Again, one could construct worst case bounds by imputing maximum and minimum response values for each of the missing response types.

One wrinkle that I ignored above was that

the orderof modes of contact may affect either response behavior or outcomes reported. This multiplies the number potential response behaviors and the number of potential outcome responses given that the unit is interviewed. You could get some way past these issues by randomizing the order of mode of contact–e.g. A then B for one half, and B then A for the other half. But you would have to impose some more assumptions to make use of this random assignment. E.g., you’d have to assume that A-then-B always-responders are exchangeable with B-then-A always responders in order to combine the information from the always-responders in each half-sample. Or, you could “shift the goal posts” by saying that all you are interested in is the average of responses from modes A and B under the A-then-B design.

Update:

The above analysis did not explore how other types of assumptions might help to identify the population average. Andy’s proposal to use post-stratification and regressions relies (according to my understanding) on the assumption potential outcomes are independent of mode of contact conditional on covariates. Formally, if the mode of contact is taking on values or , potential outcomes under mode of contact is , is principal stratum, and is a covariate, then implies that,

.

As discussed above, the design that applies modes A and B to all units in the sample can determine principal stratum membership, and so these covariate- and principal-stratum specific imputations can be applied. Ordering effects will again complicate things, and so more assumptions would be needed. A worthwhile type of analysis would be to study evidence of mode-of-contact as well as ordering effects among the type I (always-responder) units.

Now, it may be that mode of contact affects response

butunits are contacted viaeithermode A or B. Then, a unit’s principal stratum membership is not identifiable, nor is the proportion of types I through IV identifiable (we would end up with two mixtures of responding and non-responding types, with no way to parse out relative proportions of the different types). If some kind of response “monotonicity” held, then that would help a little. Response monotonicity would mean that either type II or type III responders didn’t exist. Otherwise, we would have to impose more stringent assumptions. The common one would be that principal stratum membership is independent of potential responses conditional on covariates. This is a classic “ignorable non-response” assumption, and it suffers from having no testable implications.

This is an interesting problem that has broad implications for market research too so I wonder if there is work done already in that area. I think there could be two separate issues with multiple modes:

1) response rates differ by channel, possibly interacting with self-selection into channels. this affects the selection of comparison groups.

2) the same person may answer questions differently if reached via different channels. this affects the survey answers.

Multi-unit dwellings can be surveyed without violating IRB. You need to contact the building manager to gain access to the units. It is a pain, but worth it if you really need to get in touch with individuals in person.

This is an interesting question. One way to view it is from a finite population, principal strata perspective. I put down some thoughts from that perspective here:

http://cyrussamii.com/?p=867

This helps to pinpoint what exactly you need to model and to what extent you need to do it to obtain estimates for the whole population. Namely, the modeling can be used to impute values for different types of non-responding units whose proportions are identifiable under certain designs.

Cap-recap?

This is called a "mixed mode" survey and there is a literature associated with it.

As Kaiser remarked, the main issue is how to separate mode effects from selection effects. One approach was recently published here: http://poq.oxfordjournals.org/content/74/5/1027.s…

If you are going to be doing such a systematic investigation of survey mode comparability, you should seriously consider adding an on-line sample. I'm sure you are familiar with the outfits (primarily Knowledge Networks & YouGov/Polimetrix) that use various recruitment & stratification methods to try to generate valid & representative on-line panels. Indeed, you might be able to find some guidance in the literature that looks at the comparability of these stratified on-line sampling techniques to random digit dial (which is becoming more & more suspect as response rates plummet, people switch to cell etc.) Jon Krosnick at Stanford has done some of this work.

How does the census handle it?

Wasn't the original work on mixed modes done by Campbell and Cook using MTMM designs (multi-trait/multi-method)?

Rand developed a methodology for the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey.

The relevant article is: Elliott MN, Zaslavsky AM, et al., "Effects of Survey Mode, Patient Mix, and Nonresponse on CAHPS Hospital Survey Scores," Health Services Research, 44(2):501-508, 2009

Additionally, the application of the adjustment is described here: http://www.hcahpsonline.org/modeadjustment.aspx