Efficient estimation of nonparametric genetic risk function with censored data

成果类型:
Article
署名作者:
Wang, Yuanjia; Liang, Baosheng; Tong, Xingwei; Marder, Karen; Bressman, Susan; Orr-Urtreger, Avi; Giladi, Nir; Zeng, Donglin
署名单位:
Beijing Normal University; Columbia University; Harvard University; Harvard University Medical Affiliates; Beth Israel Deaconess Medical Center; Tel Aviv University; Sackler Faculty of Medicine; University of North Carolina; University of North Carolina Chapel Hill
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/asv030
发表日期:
2015
页码:
515532
关键词:
quantitative trait loci parkinson-disease kin-cohort g2019s mutation ashkenazi jews penetrance lrrk2 regression phenotype inference
摘要:
With the discovery of an increasing number of causal genes for complex human disorders, it is crucial to assess the genetic risk of disease onset for individuals who are carriers of these causal mutations and to compare the distribution of the age-at-onset for such individuals with the distribution for noncarriers. In many genetic epidemiological studies that aim to estimate causal gene effect on disease, the age-at-onset of disease is subject to censoring. In addition, the mutation carrier or noncarrier status of some individuals may be unknown, due to the high cost of in-person ascertainment by collecting DNA samples or because of the death of older individuals. Instead, the probability of such individuals' mutation status can be obtained from various other sources. When mutation status is missing, the available data take the form of censored mixture data. Recently, various methods have been proposed for risk estimation using such data, but none is efficient for estimating a nonparametric distribution. We propose a fully efficient sieve maximum likelihood estimation method, in which we estimate the logarithm of the hazard ratio between genetic mutation groups using B-splines, while applying nonparametric maximum likelihood estimation to the reference baseline hazard function. Our estimator can be calculated via an expectation-maximization algorithm which is much faster than existing methods. We show that our estimator is consistent and semiparametrically efficient and establish its asymptotic distribution. Simulation studies demonstrate the superior performance of the proposed method, which is used to estimate the distribution of the age-at-onset of Parkinson's disease for carriers of mutations in the leucine-rich repeat kinase 2, LRRK2, gene.