Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data

成果类型:
Article
署名作者:
McKennan, Chris; Nicolae, Dan
署名单位:
Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; University of Chicago
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2020.1769635
发表日期:
2022
页码:
225-236
关键词:
panel-data models bi-cross-validation dna methylation gene-expression rna-seq
摘要:
Many high-dimensional and high-throughput biological datasets have complex sample correlation structures, which include longitudinal and multiple tissue data, as well as data with multiple treatment conditions or related individuals. These data, as well as nearly all high-throughput omic data, are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation of and inference on the effects of interest. We therefore developed CBCV and CorrConf: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high-dimensional data with correlated or nonexchangeable residuals. We demonstrate each method's superior performance compared to other state of the art methods by analyzing simulated multi-tissue gene expression data and identifying sex-associated DNA methylation sites in a real, longitudinal twin study.for this article are available online.