The dropout rate in his survey is over 60%. What should he do? I suggest MRP.

Alon Honig writes:

I work for a cpg company that conducts longitudinal surveys for analysis of customer behavior. In particular they wanted to know how people are interacting with our product. Unfortunately the designers of these surveys put so many questions (100+) that the dropout rate (those that did not complete the survey) was over 60%. The researchers of the data (all had an academic background) told me that this drop rate was in fact quite normal for such studies. In the past when I did marketing analysis we would start getting concerned about a dataset when the dropout rate was above 20%. That is because we knew there was something strange about the remaining population, making inference on the general population faulty. The research team acknowledged the issue bit didn’t seem to be concerned about the bias of their findings.

I wanted to know how we should think about the dropout rate after conducting a survey. Does this mean we should create a new one? Or should we adjust our results to account for this? What is a reasonable rate anyways?

My quick answer is to do multilevel regression and poststratification to adjust for known differences between sample and population. Use multilevel regression to model each outcome of interest, conditional on whatever variables you think are predictive of dropout and the outcome. In your regression, include interactions of these predictors with anything you care about in your modeling. Then use poststratification to take the predictions from your model and average them over your population.

And, yes, you should still try your best to minimize dropout, and to identify what factors determine dropout so that you can try to measure them and include them in your model.

P.S. Just to clarify: I’m not saying that MRP automatically solves this problem. What I’m saying is that MRP is a framework that can allow us to attack the problem.

6 thoughts on “The dropout rate in his survey is over 60%. What should he do? I suggest MRP.

  1. Am I reading this right? A consumer survey with 100+ questions had a dropout rate of only 60%? I would have guessed closer to 95%. I’ve never seen a survey like that – I have no idea what a cpg company is, so maybe I’m not understanding the context.

  2. OK – three comments:

    1. Your terminology is not usual. One distinguishes between questionnaire response rates and item response rates. The term dropouts usually relates to clinical trials and the like. After patients drop out you have nothing. On questionnaires, things are different and responders might pick and choose the questions they answer.

    2. The trick is to push for an aggressive follow up and compare responses of wave 1 to responses of wave 2. This will allow you to determine if response rates are informative in that they are related to the actual responses. In many cases they are not….

    3. The data you are getting is compositional. The number of questions answered by customers creates a composition of 100%. The compositional structure can be analyzed with CoDa methods.

    Some references:
    https://www.amazon.com/Modern-Analysis-Customer-Surveys-Applications-ebook/dp/B0067PZ71Y/ref=sr_1_7?qid=1572714949&refinements=p_27%3ARon+S.+Kenett&s=books&sr=1-7&text=Ron+S.+Kenett

    https://onlinelibrary.wiley.com/doi/abs/10.1002/qre.2029

  3. Given that Alon Honig’s CPG company appears to be JUUL Labs, Inc., it would be interesting to know whether the survey data were collected before or during its current onslaught of federal and state investigations related to vaping-related illnesses and deaths.

  4. we knew there was something strange about the remaining population, making inference on the general population faulty.

    Shouldn’t you always assume this? I mean, you aren’t going to find me filling out any surveys except under exceptional circumstances. And about half of those exceptional circumstances would mean giving false answers…

  5. “… do multilevel regression and poststratification to adjust for known differences between sample and population.”

    How does one objectively determine such “known differences” in this case ?
    How are they readily “known” ?

    • Kst:

      By “known,” I mean “measured.”

      An example of a known (measured) difference: 62% of the sample is female, but only 52% of the population is female.

      An example of an unknown (unmeasured) difference: We would expect respondents to a political survey to be more interested in politics, on average, compared to the general population.

      Adjusting for known differences is pretty straightforward. Adjusting for unknown differences requires extra modeling.

Leave a Reply to kst Cancel reply

Your email address will not be published. Required fields are marked *