Homogenization of SGD in high-dimensions: exact dynamics and generalization properties

成果类型:
Article; Early Access
署名作者:
Paquette, Courtney; Paquette, Elliot; Adlam, Ben; Pennington, Jeffrey
署名单位:
McGill University; Alphabet Inc.; DeepMind
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-024-02171-3
发表日期:
2024
关键词:
摘要:
We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares problem with l(2)-regularization. We show that homogenized SGD is the high-dimensional equivalence of SGD-for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples n and number of features d are polynomially related (d(c )<= n <= d(1/c) for some c > 0). By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the generalization performance of SGD in terms of a solution of a Volterra integral equation. Further we provide the exact value of the limiting excess risk in the case of quadratic losses when trained by SGD. The analysis is formulated for data matrices and target vectors that satisfy a family of resolvent conditions, which can roughly be viewed as a weak (non-quantitative) form of delocalization of sample-side singular vectors of the data. Several motivating applications are provided including sample covariance matrices with independent samples and random features with non-generative model targets.