A control-theoretic perspective on optimal high-order optimization
成果类型:
Article
署名作者:
Lin, Tianyi; Jordan, Michael, I
署名单位:
University of California System; University of California Berkeley; University of California System; University of California Berkeley
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-021-01721-3
发表日期:
2022
页码:
929-975
关键词:
proximal extragradient method
regularized newton methods
damped inertial dynamics
monotone inclusions
evaluation complexity
convex-optimization
CONVERGENCE
SYSTEM
algorithms
minimization
摘要:
We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function Phi : R-d -> R that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators del Phi and del(2)Phi together with a feedback control law lambda(.) satisfying the algebraic equation lambda(t))(p) parallel to del Phi(x(t))parallel to(p-1) = theta for some theta is an element of (0, 1). Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is O(1/t((3P+1)/2)) in terms of objective function gap and O(1/t(3p)) in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework of Monteiro and Svaiter (SIAM J Optim 23(2):1092-1125, 2013) and the other of which leads to a new optimal p-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of Monteiro and Svaiter (2013), it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the p-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of O(k(-3p)), which complements the recent analysis in Gasnikov et al. (in: COLT, PMLR, pp 1374-1391, 2019), Jiang et al. (in: COLT, PMLR, pp 1799-1801, 2019) and Bubeck et al. (in: COLT, PMLR, pp 492-507, 2019).