A general asymptotic framework for distribution-free graph-based two-sample tests

成果类型:
Article
署名作者:
Bhattacharya, Bhaswar B.
署名单位:
University of Pennsylvania
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/rssb.12319
发表日期:
2019
页码:
575-602
关键词:
limit-theorems data depth MULTIVARIATE
摘要:
Testing equality of two multivariate distributions is a classical problem for which many non-parametric tests have been proposed over the years. Most of the popular two-sample tests, which are asymptotically distribution free, are based either on geometric graphs constructed by using interpoint distances between the observations (multivariate generalizations of the Wald-Wolfowitz runs test) or on multivariate data depth (generalizations of the Mann-Whitney rank test). The paper introduces a general notion of distribution-free graph-based two-sample tests and provides a unified framework for analysing and comparing their asymptotic properties. The asymptotic (Pitman) efficiency of a general graph-based test is derived, which includes tests based on geometric graphs, such as the Friedman-Rafsky test, the test based on the K-nearest-neighbour graph, the cross-match test and the generalized edge count test, as well as tests based on multivariate depth functions (the Liu-Singh rank sum statistic). The results show how the combinatorial properties of the underlying graph affect the performance of the associated two-sample test and can be used to validate and decide which tests to use in practice. Applications of the results are illustrated both on synthetic and on real data sets.