Nonparametric Tests for Homogeneity Based on Non-Bipartite Matching

成果类型:
Article
署名作者:
Ruth, David M.; Koyak, Robert A.
署名单位:
United States Department of Defense; United States Navy; United States Naval Academy; United States Department of Defense; United States Navy; Naval Postgraduate School
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2011.tm10576
发表日期:
2011
页码:
1615-1625
关键词:
false discovery rate change-point MULTIVARIATE distributions trees
摘要:
Given a sequence of observations, has a change occurred in the underlying probability distribution with respect to observation order? This problem of detecting change points arises in a variety of applications including health prognostics for mechanical systems, syndromic disease surveillance in geographically dispersed populations, anomaly detection in information networks, and multivariate process control in general. Detecting change points in high-dimensional settings is challenging, and most change-point methods for multidimensional problems rely upon distributional assumptions or the use of observation history to model probability distributions. We present three new nonparametric statistical tests for heterogeneity based on the combinatorial properties of minimum non-bipartite matching (MNBM). The key idea underlying each of these tests is that if a sequence of independent random observations undergoes a change in distribution-either an abrupt shift or a gradual drift-a MNBM based on inter-point distances tends to produce pairings that are closer in the sequence labeling than would be the case if the observations were drawn from the same distribution. Our tests follow on the work of Rosenbaum (2005) who used MNBM to derive a simple cross-match test statistic for the two-sample problem based on this idea. Similar ideas are present in the minimum spanning tree (MST) test derived by Friedman and Rafsky (1979, 1981). We extend these approaches by utilizing ensembles of orthogonal MNBMs which greatly increase information extraction from the data, leading to tests that compare favorably to parametric procedures while maintaining level and good power properties across distributions.