Changepoint Detection in the Presence of Outliers

成果类型:
Article
署名作者:
Fearnhead, Paul; Rigaill, Guillem
署名单位:
Lancaster University; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Biology (INSB); INRAE; Universite Paris Saclay; Universite Paris Cite; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI); Universite Paris Saclay; INRAE; Ecole Nationale Superieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE)
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2017.1385466
发表日期:
2019
页码:
169-183
关键词:
dynamic-programming algorithm least-squares estimation change-point binary segmentation bayesian-inference number
摘要:
Many traditional methods for identifying changepoints can struggle in the presence of outliers, or when the noise is heavy-tailed. Often they will infer additional changepoints to fit the outliers. To overcome this problem, data often needs to be preprocessed to remove outliers, though this is difficult for applications where the data needs to be analyzed online. We present an approach to changepoint detection that is robust to the presence of outliers. The idea is to adapt existing penalized cost approaches for detecting changes so that they use loss functions that are less sensitive to outliers. We argue that loss functions that are bounded, such as the classical biweight loss, are particularly suitableas we show that only bounded loss functions are robust to arbitrarily extreme outliers. We present an efficient dynamic programming algorithm that can find the optimal segmentation under our penalized cost criteria. Importantly, this algorithm can be used in settings where the data needs to be analyzed online. We show that we can consistently estimate the number of changepoints, and accurately estimate their locations, using the biweight loss function. We demonstrate the usefulness of our approach for applications such as analyzing well-log data, detecting copy number variation, and detecting tampering of wireless devices. Supplementary materials for this article are available online.