Hierarchical Community Detection by Recursive Partitioning
成果类型:
Article
署名作者:
Li, Tianxi; Lei, Lihua; Bhattacharyya, Sharmodeep; Van den Berge, Koen; Sarkar, Purnamrita; Bickel, Peter J.; Levina, Elizaveta
署名单位:
University of Virginia; Stanford University; Oregon State University; University of California System; University of California Berkeley; Ghent University; University of Texas System; University of Texas Austin; University of Michigan System; University of Michigan
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2020.1833888
发表日期:
2022
页码:
951-968
关键词:
maximum-likelihood
gene ontology
networks
Consistency
regularization
database
models
graphs
摘要:
The problem of community detection in networks is usually formulated as finding a single partition of the network into some correct number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stopping rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm's theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We apply the algorithm to a gene network based on gene co-occurrence in 1580 research papers on anemia, and identify six clusters of genes in a meaningful hierarchy. We also illustrate the algorithm on a dataset of statistics papers. for this article are available online.