Understanding the acceleration phenomenon via high-resolution differential equations
成果类型:
Article
署名作者:
Shi, Bin; Du, Simon S.; Jordan, Michael, I; Su, Weijie J.
署名单位:
Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS; University of Washington; University of Washington Seattle; University of California System; University of California Berkeley; University of Pennsylvania
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-021-01681-8
发表日期:
2022
页码:
79-148
关键词:
convergence
optimization
algorithm
SYSTEM
摘要:
Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODES do not distinguish between two fundamentally different algorithms-Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method-we study an alternative limiting process that yields high-resolution ODEs. We show that these ODES permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as gradient correction that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result-that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.
来源URL: