Skip to content

Missing data imputation updating one variable at a time

Masanao discovered this interesting paper by Ralf Munnich and Susanne Rassler. From the abstract:

In this paper we [Munnich and Susanne Rassler] discuss imputation issues in large-scale data sets with different scaled variables laying special emphasis on binary variables. Since fitting a multivariate imputation model can be cumbersome, univariate specifications are proposed which are much easier to perform. The regression-switching or chained equations Gibbs sampler is proposed and possible theoretical shortcomings of this approach are addressed as well as data problems.

A simulation study is done based on the data of the German Microcensus, which is often used to analyse unemployment. Multiple imputation, raking, and calibration techniques are compared for estimating the number of unemployed in different settings. We find that the logistic multiple imputation routine for binary variable, in some settings, may lead to poor point as well as variance estimates. To overcome possible shortcomings of the logistic regression imputation, we derive a multiple imputation matching algorithm which turns out to work well.

This is important stuff. They refer to the packages Mice and Iveware which have inspired our new and improved (I hope) “mi” package which is more flexible than these predecessors. Unfortunately with this flexibility comes the possibility of more problems, so it’s good to see this sort of research. I like the paper a lot (except for the ugly color figures on pages 12-16!).

One Comment

  1. Anonymous says:

    Those were figures? I thought they were flags. Ugly is right.