您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the Royal Statistical Society: Series B > 2020 > 4期

A unified data-adaptive framework for high dimensional change point detection

成果类型：

Article

署名作者：

Liu, Bin; Zhou, Cheng; Zhang, Xinsheng; Liu, Yufeng

署名单位：

Fudan University; Tencent; University of North Carolina; University of North Carolina Chapel Hill

刊物名称：

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY

ISSN/ISSBN：

1369-7412

DOI：

10.1111/rssb.12375

发表日期：

2020

页码：

933-963

关键词：

covariance structure U-statistics time-series bootstrap tests

摘要：

In recent years, change point detection for a high dimensional data sequence has become increasingly important in many scientific fields such as biology and finance. The existing literature develops a variety of methods designed for either a specified parameter (e.g. the mean or covariance) or a particular alternative pattern (sparse or dense), but not for both scenarios simultaneously. To overcome this limitation, we provide a general framework for developing tests that are suitable for a large class of parameters, and also adaptive to various alternative scenarios. In particular, by generalizing the classical cumulative sum statistic, we construct theU-statistic-based cumulative sum matrixC. Two cases corresponding to common or different change point locations across the components are considered. We then propose two types of individual test statistics by aggregatingCon the basis of the adjustedL(p)-norm withp is an element of {1, horizontal ellipsis ,infinity}. Combining the corresponding individual tests, we construct two types of data-adaptive tests for the two cases, which are both powerful under various alternative patterns. A multiplier bootstrap method is introduced for approximating the proposed test statistics' limiting distributions. With flexible dependence structure across co-ordinates and mild moment conditions, we show the optimality of our methods theoretically in terms of size and power by allowing the dimensiondand the number of parametersqto be much larger than the sample sizen. An R package called AdaptiveCpt is developed to implement our algorithms. Extensive simulation studies provide further support for our theory. An application to a comparative genomic hybridization data set also demonstrates the usefulness of our proposed methods.