BAYESIAN COX REGRESSION FOR LARGE-SCALE INFERENCE WITH APPLICATIONS TO ELECTRONIC HEALTH RECORDS

成果类型:
Article
署名作者:
Jung, Alexander Wolfgang; Gerstung, Moritz
署名单位:
European Molecular Biology Laboratory (EMBL); European Bioinformatics Institute; Helmholtz Association; German Cancer Research Center (DKFZ)
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/22-AOAS1658
发表日期:
2023
页码:
1064-1085
关键词:
VARIABLE SELECTION cardiovascular-disease regularization paths survival analysis adaptive lasso models RISK algorithms
摘要:
The Cox model is an indispensable tool for time-to-event analysis, particularly in biomedical research. However, medicine is undergoing a profound transformation, generating data at an unprecedented scale, which opens new frontiers to study and understand diseases. With the wealth of data collected, new challenges for statistical inference arise, as datasets are often high dimensional, exhibit an increasing number of measurements at irregularly spaced time points, and are simply too large to fit in memory. Many current implementations for time-to-event analysis are ill-suited for these problems, as inference is computationally demanding and requires access to the full data at once. Here, we propose a Bayesian version for the counting process representation of Cox's partial likelihood for efficient inference on large-scale datasets with millions of data points and thousands of time-dependent covariates. Through the combination of stochastic variational inference and a reweighting of the log-likelihood, we obtain an approximation for the posterior distribution that factorizes over subsamples of the data, enabling the analysis in big data settings. Crucially, the method produces viable uncertainty estimates for large-scale and high-dimensional datasets. We show the utility of our method through a simulation study and an application to myocardial infarction in the UK Biobank, where we characterize the multivariate effects of risk factors and replicate results from individual studies. Our framework extends the Cox model to new data sources, like biobanks and EHR, the combination of which can provide new insights into our understanding of diseases.
来源URL: