GENERALIZED RESILIENCE AND ROBUST STATISTICS

成果类型:
Article
署名作者:
Zhu, Banghua; Jiao, Jiantao; Steinhardt, Jacob
署名单位:
University of California System; University of California Berkeley; University of California System; University of California Berkeley
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/22-AOS2186
发表日期:
2022
页码:
2256-2283
关键词:
regression
摘要:
Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be maliciously corrupted in many other ways, such as systematic measurement errors and missing covariates. We consider corruption in either TV or Wasserstein distance, and show that robust estimation is possible whenever the true population distribution satisfies a property called generalized resilience, which holds under moment or hypercontractive conditions. For TV corruption model, our finite-sample analysis improves over previous results for mean estimation with bounded kth moment, linear regression, and joint mean and covariance estimation. For W-1 corruption, we provide the first finite-sample guarantees for second moment estimation and linear regression. Technically, our robust estimators are a generalization of minimum distance (MD) functionals, which project the corrupted distribution onto a given set of well-behaved distributions. The error of these MD functionals is bounded by a certain modulus of continuity, and we provide a systematic method for upper bounding this modulus for the class of generalized resilient distributions, which usually gives sharp population-level results and good finite-sample guarantees.