Modeling reporting delays and reporting corrections in cancer registry data

成果类型:
Article
署名作者:
Midthune, DN; Fay, MP; Clegg, LX; Feuer, EJ
署名单位:
National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI); National Institutes of Health (NIH) - USA; NIH National Institute of Allergy & Infectious Diseases (NIAID); National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI); NIH Division of Cancer Control & Population Sciences
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214504000001899
发表日期:
2005
页码:
61-70
关键词:
truncated data aids incidence claims rates SURVEILLANCE prediction events TRENDS
摘要:
The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute is an authoritative source of cancer incidence statistics in the United States. The SEER program is a consortium of population-based cancer registries from different areas of the country. Each registry is charged with collecting data on all cancers that occur within its geographic area. As with any disease registry, there is a delay between the time that the disease (cancer) is first diagnosed and the time that it is reported to the registry. The SEER program has allowed for reporting delays of up to 19-months before releasing data for public use. Nevertheless, additional cases are discovered after the 19-month delay, and these cases are added in subsequent releases of the data. Further, any errors discovered are corrected in subsequent releases. Such reporting delays and corrections typically lead to underestimation of the cancer incidence rates in recent diagnosis years, making it difficult to monitor trends. In this article we study models that account for reporting delays and corrections in predicting eventual cancer counts for a diagnosis year from the preliminary counts. Previous models of this type have been studied, especially as applied to AIDS registries. We offer several additions to existing models. First, we explicitly model the reporting corrections. Second, we model the delay distribution with very general models, combining aspects of previous nonparametric-like models (i.e., models that have a separate parameter for each delay time) with more parametric models. Third, we allow random reporting-year effects in the model. Practical issues of model selection and how the data are classified are also discussed, particularly how the definition of a reporting correction may change depending on how subpopulations are defined. An example with SEER melanoma data is studied in detail.