Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs

成果类型:
Article
署名作者:
Spiegelman, D; Rosner, B; Logan, R
署名单位:
Harvard University; Harvard T.H. Chan School of Public Health; Harvard University; Harvard T.H. Chan School of Public Health; Harvard University; Harvard University Medical Affiliates; Brigham & Women's Hospital
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.2307/2669522
发表日期:
2000
页码:
51-61
关键词:
person measurement error alloyed gold standard confidence-intervals breast-cancer dietary-fat RISK models rationale
摘要:
In epidemiological studies, continuous covariates often are measured with error and categorical covariates often are misclassified. Using the logistic regression model to represent the relationship between the binary outcome and the perfectly measured and classified covariates, the model for the observed main study data is derived. This derivation relies on the assumption that the error in the continuous covariates is multivariate normally distributed and uses a chain of logistic regression models to describe the misclassification processes. These model assumptions are empirically verified in the validation study, where the misclassified and mismeasured covariates are validated using perfectly measured and classified data. The full data likelihood, including contributions from both the main study and the Validation study, is maximized to obtain the maximum likelihood estimates for the parameters of the underlying logistic regression model and of the measurement error model and reclassification models simultaneously. Standard asymptotic theory is applied. An example of this methodology is presented from the Nurses' Health Study investigating the relationship between cumulative incidence of breast cancer and saturated fat, total energy, and alcohol intake. A detailed simulation study was conducted to investigate the small-sample properties of these likelihood-based estimates and inferential quantities. No single estimation/inference option performed satisfactorily when the main study/validation study size was representative of that typically encountered in practice; When the validation size was twice or larger than from the usual one, features of asymptotic optimality were more apparent. By example and through simulation, the procedures appeared to be robust to misspecification of the order of the chain of conditional measurement error/reclassification models.