BAYESIAN BI-CLUSTERING METHODS WITH APPLICATIONS IN COMPUTATIONAL BIOLOGY
成果类型:
Article
署名作者:
Yan, By Han; Wu, Jiexing; LI, Yang; Liu, Jun S.
署名单位:
Harvard University; Alphabet Inc.; Google Incorporated
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/22-AOAS1622
发表日期:
2022
页码:
2804-2831
关键词:
VARIABLE SELECTION
likelihood
models
摘要:
Bi-clustering is a useful approach in analyzing large biological data sets when the observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to high dimensions and propose three Bayesian bi-clustering models on categorical data which increase in complexities in their modeling of the distributions of features across bi-clusters. Our proposed methods apply to a wide range of scenarios: from situations where data are cluster-distinguishable only among a small subset of features but masked by a large amount of noise to situations where different groups of data are identified by different sets of features or data exhibit hierarchical structures. Through simulation studies we show that our methods outperform existing (bi-)clustering methods in both identifying clusters and recovering feature distributional patterns across bi-clusters. We further apply the developed approaches to a human genetic dataset, a human single-cell genomic dataset, and a collection of 1774 mouse genomic datasets with a focus on 58 genes from two pathways.
来源URL: