A geometrical viewpoint on the benign overfitting property of the minimum l2-norm interpolant estimator and its universality
成果类型:
Article
署名作者:
Lecue, Guillaume; Shang, Zong
署名单位:
ESSEC Business School; Institut Polytechnique de Paris; Ecole Polytechnique; ENSAE Paris
刊物名称:
PROBABILITY THEORY AND RELATED FIELDS
ISSN/ISSBN:
0178-8051
DOI:
10.1007/s00440-024-01336-7
发表日期:
2025
页码:
1401-1484
关键词:
recovery
摘要:
In the linear regression model, the minimum 2-norm interpolant estimator ss has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix of the input vector, known as benign overfitting. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from (Tsigler et al. in J Mach Learn Res 24(123):1-76, 2021). Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [5]: ss can be written as a sum of a ridge estimator ss 1:k and an overfitting component ss k+1: p which follows a decomposition of the features space Rp = V1: k.. Vk+ 1:p into the space V1: k spanned by the top k eigenvectors of and Vk+ 1: p spanned by the p - k last ones. We also prove a matching lower bound for the expected prediction risk thus obtain the sufficient and necessary conditions for benign overfitting of ss. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank from (Bartlett Proc Natl Acad Sci 117(48), 30063-30070, 2020), (Tsigler et al. in J Mach Learn Res 24(123):1-76, 2021) and is the key tool for handling the behavior of the design matrix restricted to the sub-space Vk+ 1: p where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponentialmoment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.
来源URL: