BAYESIAN COMBINATORIAL MULTISTUDY FACTOR ANALYSIS

成果类型:
Article
署名作者:
Grabski, Isabella N.; De Vito, Roberta; Trippa, Lorenzo; Parmigiani, Giovanni
署名单位:
Harvard University; Harvard T.H. Chan School of Public Health; Brown University; Harvard University; Harvard University Medical Affiliates; Dana-Farber Cancer Institute
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/22-AOAS1715
发表日期:
2023
页码:
2212-2235
关键词:
model selection
摘要:
Mutations in the BRCA1 and BRCA2 genes are known to be highly associated with breast cancer. Identifying both shared and unique transcript expression patterns in blood samples from these groups can shed insight into if and how the disease mechanisms differ among individuals by mutation status, but this is challenging in the high-dimensional setting. A recent method, mon to all studies (or equivalently, groups) and latent factors specific to individual studies. However, BMSFA does not allow for factors shared by more than one but less than all studies. This is critical in our context, as we may expect some but not all signals to be shared by BRCA1- and BRCA2-mutation carriers but not necessarily other high-risk groups. We extend BMSFA by introducing a new method, Tetris, for Bayesian combinatorial multistudy factor analysis which identifies latent factors that any combination of studies or groups can share. We model the subsets of studies that share latent factors with an Indian buffet process and offer a way to summarize uncertainty in the sharing patterns using credible balls. We test our method with an extensive range of simulations and showcase its utility not only in dimension reduction but also in covariance estimation. When applied to transcript expression data from high-risk families grouped by mutation status, Tetris reveals the features and pathways characterizing each group and the sharing patterns among them. Finally, we further extend Tetris to discover groupings of samples when group labels are not provided which can elucidate additional structure in these data.
来源URL: