Specifying and implementing nonparametric and semiparametric survival estimators in two-stage (nested) cohort studies with missing case data

成果类型:
Article
署名作者:
Mark, Steven D.; Katki, Hormuzd A.
署名单位:
University of Colorado System; University of Colorado Anschutz Medical Campus; University of Colorado Denver; National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI); NIH National Cancer Institute- Division of Cancer Epidemiology & Genetics
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214505000000952
发表日期:
2006
页码:
460-471
关键词:
gastric cardia cancer helicobacter-pylori esophageal cancer RISK linxian polymorphisms carcinoma
摘要:
Since 1986, we have been studying a cohort of individuals from a region in China with epidemic rates of gastric cardia cancer and have conducted numerous two-stage studies to assess the association of various exposures with this cancer. Two-stage studies are a commonly used statistical design. Stage one involves observing the outcomes and accessible baseline covariate information on all cohort members, and stage two involves using the stage one observations to select a subset of the cohort for measurements of exposures that are difficult to obtain. When the outcomes are censored failure times, such as in our studies, the most common designs used are the case-cohort and nested case-control designs. One limitation of both these designs is that the estimators of the cumulative hazards, and hence survivals and absolute risks, are biased when some cases are missing the stage two measurements. In our experience, such missingness is present in virtually all two-stage studies that (like ours) use biological specimens to obtain exposure measurements. In earlier work we derived and characterized the efficiency of a class of nonparametric and a class of semiparametric cumulative hazard estimators that are unbiased regardless of whether or not all cases are measured. In this article we limit the presentation of the mathematical derivation of these two classes to aspects important to study design and analysis. We analyze data from a two-stage study that we conducted on the association of Helicobacter pylori infection with incident gastric cardia cancers. We discuss the substantive reasons why we deliberately sampled only 25% of the available cancer cases. Through simulations, we demonstrate that substantial variation in precision exists between unbiased estimators within each class, and express the origin of these differences in terms of parameters familiar to investigators. We describe how preexistent knowledge about these parameters can be used to increase estimator precision, and detail specific strategies for constructing such estimators. Computer code in R that implements these estimators is available from the authors on request.