Efficient and doubly robust imputation for covariate-dependent missing responses

成果类型:
Article
署名作者:
Qin, Jing; Shao, Jun; Zhang, Biao
署名单位:
National Institutes of Health (NIH) - USA; NIH National Institute of Allergy & Infectious Diseases (NIAID); University of Wisconsin System; University of Wisconsin Madison; University System of Ohio; University of Toledo
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214508000000238
发表日期:
2008
页码:
797-810
关键词:
ratio confidence-intervals Empirical Likelihood estimators quantiles
摘要:
In this article we study a well-known response missing-data problem. Missing data is an ubiquitous problem in medical and social science studies. Imputation is one of the most popular methods for dealing with missing data. The most commonly used imputation that makes use of covariates is regression imputation, in which the regression model can be parametric, semiparametric, or nonparametric. Parametric regression imputation is efficient but is not robust against misspecification of the regression model. Although nonparametric regression imputation (such as nearest-neighbor imputation and kernel regression imputation) is model-free, it is not efficient, especially if the dimension of covariate vector is high (the well-known problem of curse of dimensionality). Semiparametric regression imputation (such as partially linear regression imputation) can reduce the dimension of the covariate in nonparametric regression fitting but is not robust against misspecification of the linear component in the regression. Assuming that the missing mechanism is covariate-dependent and that the propensity function can be specified correctly, we propose a regression imputation method that has good efficiency and is robust against regression model misspecification. Furthermore, our method is valid as long as either the regression model or the propensity model is correct, a property known as the double-robustness property. We show that asymptotically the sample mean based on our imputation achieves the semiparametric efficient lower bound if both regression and propensity models are specified correctly. Our simulation results demonstrate that the proposed method outperforms many existing methods for handling missing data, especially when the regression model is misspecified. As an illustration, an economic observational data set is analyzed.