TESTING FOR OUTLIERS WITH CONFORMAL P-VALUES

成果类型:
Article
署名作者:
Bates, Stephen; Candes, Emmanuel; Lei, Lihua; Romano, Yaniv; Sesia, Matteo
署名单位:
University of California System; University of California Berkeley; University of California System; University of California Berkeley; Stanford University; Stanford University; Stanford University; Technion Israel Institute of Technology; Technion Israel Institute of Technology; University of Southern California
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/22-AOS2244
发表日期:
2023
页码:
149-178
关键词:
false discovery rate anomaly detection multiple
摘要:
This paper studies the construction of p-values for nonparametric out-lier detection, from a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a general framework yielding p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively de-pendent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage con-centration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Further, our techniques also yield a uniform confi-dence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by experiments on real and simulated data.
来源URL: