|
Sign In to gain access to subscriptions and/or personal tools.
|
Multiple imputation of discrete and continuous data by fully conditional specification
Stef van Buuren
TNO Quality of Life, Leiden, The Netherlands and University of Utrecht, The Netherlands, stef.vanbuuren{at}tno.nl
The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls that had missing data on menarche (two categories), breast development (five categories) and pubic hair development (six stages). Imputations for these data were created under two models: a multivariate normal model with rounding and a conditionally specified discrete model. The JM approach introduced biases in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified.
Statistical Methods in Medical Research, Vol. 16, No. 3,
219-242 (2007)
DOI: 10.1177/0962280206074463

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
This article has been cited by other articles:

|
 |

|
 |
 
K. J. Lee and J. B. Carlin
Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation
Am. J. Epidemiol.,
January 27, 2010;
(2010):
kwp425v1 - kwp425.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. A. Hayward, H. M. Krumholz, D. M. Zulman, J. W. Timbie, and S. Vijan
Optimizing Statin Treatment for Primary Prevention of Coronary Artery Disease
Ann Intern Med,
January 19, 2010;
152(2):
69 - 77.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. He
Missing Data Analysis Using Multiple Imputation: Getting to the Heart of the Matter
Circ Cardiovasc Qual Outcomes,
January 1, 2010;
3(1):
98 - 105.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Goldstein, J. Carpenter, M. G Kenward, and K. A Levin
Multilevel models with multivariate mixed response types
Statistical Modeling,
October 1, 2009;
9(3):
173 - 197.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
D. A Wagstaff, S. Kranz, and O. Harel
A preliminary study of active compared with passive imputation of missing body mass index values among non-Hispanic white youths
Am. J. Clinical Nutrition,
April 1, 2009;
89(4):
1025 - 1030.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. B. Bruce, P. Preechawat, N. J. Newman, M. J. Lynn, and V. Biousse
Racial differences in idiopathic intracranial hypertension
Neurology,
March 11, 2008;
70(11):
861 - 867.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|