SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION

成果类型:
Article
署名作者:
Hastie, Trevor; Montanari, Andrea; Rosset, Saharon; Tibshirani, Ryan J.
署名单位:
Stanford University; Stanford University; Stanford University; Tel Aviv University; Carnegie Mellon University; Carnegie Mellon University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/21-AOS2133
发表日期:
2022
页码:
949-986
关键词:
generalized cross-validation Asymptotic Optimality neural-networks regression cl
摘要:
Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum l(2) norm (ridgeless) interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors x(i) is an element of R-p are obtained by applying a linear transform to a vector of i.i.d. entries, x(i) = Sigma(1/2)zi (with z(i) Sigma R-p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, x(i) = phi(Wz(i)) (with z(i) is an element of R-d, W is an element of R(pxd )a matrix of i.i.d. entries, and phi an activation function acting componentwise on Wz(i)). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the double descent behavior of the prediction risk, and the potential benefits of overparametrization.
来源URL: