Network Cross-Validation for Determining the Number of Communities in Network Data

成果类型:
Article
署名作者:
Chen, Kehui; Lei, Jing
署名单位:
Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; Carnegie Mellon University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2016.1246365
发表日期:
2018
页码:
241-251
关键词:
stochastic block model bayesian-inference blockmodels Consistency
摘要:
The stochastic block model (SBM) and its variants have been a popular tool for analyzing large network data with community structures. In this article, we develop an efficient network cross-validation (NCV) approach to determine the number of communities, as well as to choose between the regular stochastic block model and the degree corrected block model (DCBM). The proposed NCV method is based on a block-wise node-pair splitting technique, combined with an integrated step of community recovery using sub-blocks of the adjacency matrix. We prove that the probability of under-selection vanishes as the number of nodes increases, under mild conditions satisfied by a wide range of popular community recovery algorithms. The solid performance of our method is also demonstrated in extensive simulations and two data examples. Supplementary materials for this article are available online.