ROBUST k-MEANS CLUSTERING FOR DISTRIBUTIONS WITH TWO MOMENTS

成果类型:
Article
署名作者:
Klochkov, Yegor; Kroshnin, Alexey; Zhivotovskiy, Nikita
署名单位:
University of Cambridge; HSE University (National Research University Higher School of Economics); Russian Academy of Sciences; Kharkevich Institute for Information Transmission Problems of the RAS; Alphabet Inc.; Google Incorporated
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/20-AOS2033
发表日期:
2021
页码:
2206-2230
关键词:
vector QUANTIZATION distortion bounds
摘要:
We consider the robust algorithms for the k-means clustering problem where a quantizer is constructed based on N independent observations. Our main results are median of means based nonasymptotic excess distortion bounds that hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of (Ann. Statist. 9 (1981) 135-140) who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in R-d. In a special case of clustering in R-d, under two bounded moments, we prove matching (up to constant factors) nonasymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the subGaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators.