TWO-SAMPLE HYPOTHESIS TESTING FOR INHOMOGENEOUS RANDOM GRAPHS
成果类型:
Article
署名作者:
Ghoshdastidar, Debarghya; Gutzeit, Maurilio; Carpentier, Alexandra; von Luxburg, Ulrike
署名单位:
Eberhard Karls University of Tubingen; Otto von Guericke University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/19-AOS1884
发表日期:
2020
页码:
2208-2229
关键词:
Community Detection
HIGHER CRITICISM
inference
dimension
models
摘要:
The study of networks leads to a wide range of high-dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m. We answer this question from a minimax testing perspective. Let P, Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P = Q against d(P, Q) > p. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases. We also show that if m > 1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m = 1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs.