PROBABILISTIC CONTRASTIVE DIMENSION REDUCTION FOR CASE-CONTROL STUDY DATA

成果类型:
Article
署名作者:
Li, Didong; Jones, Andrew; Engelhardt, Barbara
署名单位:
University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine; Princeton University; University of California System; University of California San Francisco; The J David Gladstone Institutes
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1877
发表日期:
2024
页码:
2207-2229
关键词:
VARIABLE SELECTION component analysis principal
摘要:
Case-control experiments are essential to the scientific method, as they allow researchers to test biological hypotheses by looking for differences in outcome between cases and controls. It is then of interest to characterize variation that is enriched in a foreground (case) dataset relative to a background (control) dataset. For example, in a genomics context, the goal is to identify low-dimensional transcriptional structure unique to patients with certain disease (cases) vs. those without that disease (controls). In this work we propose probabilistic contrastive principal component analysis (PCPCA), a probabilistic dimension reduction method designed for case-control data. We describe inference in PCPCA through a contrastive likelihood and show that our model generalizes PCA, probabilistic PCA, and contrastive PCA. We discuss how to set the tuning parameter in theory and in practice, and we show several of PCPCA's advantages in the analysis of case-control data over related methods, including greater interpretability, uncertainty quantification and principled inference, robustness to noise and missing data, and the ability to generate foreground-enriched data from the model. We demonstrate PCPCA's performance on case-control data through a series of simulations, and we successfully identify variation specific to case data in genomic case- control experiments with data modalities, including gene expression, protein expression, and images.