Tutorial: Handling missing data in R with MICE


Stef van Buuren, TNO Quality of Life, Leiden and Faculty of Social Sciences, University of Utrecht, The Netherlands
Karin Groothuis-Oudshoorn, Health Technology and Services Research, University of Twente, The Netherlands

Abstract

Multiple imputation (Rubin 1987, 1996) is the method of choice for complex incomplete data problems. Missing data that occur in more than one variable presents a special challenge. Two general approaches for imputing multivariate data have emerged: joint modeling (JM) and fully conditional specification (FCS) (van Buuren 2007). Multivariate Imputation by Chained Equations (MICE) is the name of software for imputing incomplete multivariate data by FCS.

In this tutorial we present the R package mice v2.1, which extends the functionality of mice v1.0 in several ways (van Buuren and Groothuis-Oudshoorn 2009). In the tutorial a hands-on, stepwise approach will be given to using mice v2.1 for solving incomplete data problems in real data. The goal of the tutorial is to provide sound and practical imputation techniques to obtain appropriate statistical inferences from incomplete data.

The tutorial focuses on the specification of the imputation model, the most challenging step in multiple imputation. There is no magical setting that produces appropriate imputations in every problem. The tutorial will teach you how to go beyond the default settings. In addition, we outline practical tools and techniques for analyzing the imputed data.

Outline

Topics will include:

Prerequisites

Elementary knowledge of general statistical concepts and (linear) statistical models is assumed. Moreover, basic programming in R is useful.

Potential attendees

R users, researchers who wants to analyse datasets with missing data.

References

Rubin DB (1987). Multiple imputation for nonresponse in surveys. Wiley, New York.
Rubin DB (1996). Multiple Imputation after 18+ Years. Journal of the American Statistical Association, 91(434), 473-489.
Van Buuren S (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3), 219-242.
Van Buuren S, Groothuis-Oudshoorn K (2009). MICE: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software, forthcoming.