ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE

成果类型:
Article
署名作者:
Lecue, Guillaume; Lerasle, Matthieu
署名单位:
Institut Polytechnique de Paris; ENSAE Paris; Ecole Polytechnique; Universite Paris Saclay; Centre National de la Recherche Scientifique (CNRS)
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/19-AOS1828
发表日期:
2020
页码:
906-931
关键词:
model selection variable selection risk minimization regularization CONVERGENCE Lasso estimators regression RECOVERY bounds
摘要:
Median-of-means (MOM) based procedures have been recently introduced in learning theory (Lugosi and Mendelson (2019); Lecue and Lerasle (2017)). These estimators outperform classical least-squares estimators when data are heavy-tailed and/or are corrupted. None of these procedures can be implemented, which is the major issue of current MOM procedures (Ann. Statist. 47 (2019) 783-794). In this paper, we introduce minmax MOM estimators and show that they achieve the same sub-Gaussian deviation bounds as the alternatives (Lugosi and Mendelson (2019); Lecue and Lerasle (2017)), both in small and high-dimensional statistics. In particular, these estimators are efficient under moments assumptions on data that may have been corrupted by a few outliers. Besides these theoretical guarantees, the definition of minmax MOM estimators suggests simple and systematic modifications of standard algorithms used to approximate least-squares estimators and their regularized versions. As a proof of concept, we perform an extensive simulation study of these algorithms for robust versions of the LASSO.