您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the American Statistical Association > 2008 > 482期

Efficient and doubly robust imputation for covariate-dependent missing responses

成果类型：

Article

署名作者：

Qin, Jing; Shao, Jun; Zhang, Biao

署名单位：

National Institutes of Health (NIH) - USA; NIH National Institute of Allergy & Infectious Diseases (NIAID); University of Wisconsin System; University of Wisconsin Madison; University System of Ohio; University of Toledo

刊物名称：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

ISSN/ISSBN：

0162-1459

DOI：

10.1198/016214508000000238

发表日期：

2008

页码：

797-810

关键词：

ratio confidence-intervals Empirical Likelihood estimators quantiles

摘要：

In this article we study a well-known response missing-data problem. Missing data is an ubiquitous problem in medical and social science studies. Imputation is one of the most popular methods for dealing with missing data. The most commonly used imputation that makes use of covariates is regression imputation, in which the regression model can be parametric, semiparametric, or nonparametric. Parametric regression imputation is efficient but is not robust against misspecification of the regression model. Although nonparametric regression imputation (such as nearest-neighbor imputation and kernel regression imputation) is model-free, it is not efficient, especially if the dimension of covariate vector is high (the well-known problem of curse of dimensionality). Semiparametric regression imputation (such as partially linear regression imputation) can reduce the dimension of the covariate in nonparametric regression fitting but is not robust against misspecification of the linear component in the regression. Assuming that the missing mechanism is covariate-dependent and that the propensity function can be specified correctly, we propose a regression imputation method that has good efficiency and is robust against regression model misspecification. Furthermore, our method is valid as long as either the regression model or the propensity model is correct, a property known as the double-robustness property. We show that asymptotically the sample mean based on our imputation achieves the semiparametric efficient lower bound if both regression and propensity models are specified correctly. Our simulation results demonstrate that the proposed method outperforms many existing methods for handling missing data, especially when the regression model is misspecified. As an illustration, an economic observational data set is analyzed.