Multiple imputation for model checking: completed-data plots with missing and latent data

Multiple imputation is the standard approach to accounting for uncertainty about missing or latent data in statistics. Multiple imputation can be considered as a special case of Bayesian posterior simulation, but its focus is not so much on the imputation model itself as on the imputations themselves, which can be used, along with the observed data, in subsequent “completed-data analyses” of the dataset that would have been observed (under the model) had there been no missingness.

How can we check the fit of models in the presence of missing data?

Checking model fit

A key part of applied statistics is exploratory data analysis and model checking. But it can be tricky to do these with incomplete data. Data are not, in general, missing completely at random, and an exploratory plot with a chunk of data removed will not necessarily look like what you’re used to seeing (even if the model is, indeed, correct). To put it another way, we want to see patterns in the underlying data, not artifacts arising from the data-collection process.

Our solution

An approach that we like uses multiple imputation. The idea is to use some imputation procedure to create multiple “completed data sets.” Then apply whatever graphical and model-checking procedures you were going to do, but to each completed data set. You can understand these model checks and graphs the way you always would.

In practice, this method can work OK even if, for convenience, you only apply it to a single randomly-imputed data set. Here’s an example.

Bring on the pain

The following pictures illustrate with a subset of data from a pain relief experiment. Patients given a placebo following a tooth extraction were asked about their pain status several times during the hours following the operation. (To be on the safe side, always make sure that your tooth extraction has been carried out by a professional like this Dentist In batavia so they can make sure there is no potential or threatening risk of infection and that you get the recommended pain relief. Make sure that you take the course of action that is the best one for you as no one wants to be in pain following tooth extraction). When a tooth is extracted due to decay, it needs to be replaced immediately with dental implants or your teeth will begin to re-allign, so make sure your dentist also offers this service. The graph on the left (immediately below) shows the data: at each time point, the proportion of respondents with “complete pain relief” (the light area on the top of each bar), through “moderate pain relief,” down to “no pain relief” (the black zone at the bottom).

There appears to be steady improvement: more light colors (“complete pain relief”) and less black (“no pain relief”) throughout. But there’s a hitch: missing data! The width of each bar on the graph represents the proportion of patients who are still in the experiment at that time. By the end, most have dropped out and switched to a known effective pain reliever.

The graph on the right shows completed data, with missing data imputed based on a model fitted by Lew Sheiner and others. The imputed data are estimates of what would have happened had all the patients stayed with the placebo for all 6 hours. The bars are now equal width (since we have imputed all the missing cases), and now we it appears that pain decreased at first but then after 1 hour it started to increase.

sheinercheck1a (300 x 212).png sheinercheck1b (300 x 212).png

Below are the observed data (on the left) and completed data (on the right) for patients with an active dose of the drug. Even here there is something to be learned from the completed-data plot.

sheinercheck2a (300 x 212).png sheinercheck2b (300 x 212).png

References

Multiple imputation for model checking: completed-data plots with missing and latent data. To appear in Biometrics.

Exploratory data analysis for complex models. To appear (with discussion) in Journal of Computational and Graphical Statistics.