Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization

成果类型:
Article
署名作者:
Altschuler, Jason M.; Parrilo, Pablo A.
署名单位:
University of Pennsylvania; Massachusetts Institute of Technology (MIT)
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-024-02164-2
发表日期:
2025
页码:
1105-1118
关键词:
performance
摘要:
We provide a concise, self-contained proof that the Silver Stepsize Schedule proposed in our companion paper directly applies to smooth (non-strongly) convex optimization. Specifically, we show that with these stepsizes, gradient descent computes an epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-minimizer in O(epsilon-log rho 2)=O(epsilon-0.7864)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\varepsilon <^>{-\log _{\rho } 2}) = O(\varepsilon <^>{-0.7864})$$\end{document} iterations, where rho=1+2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho = 1+\sqrt{2}$$\end{document} is the silver ratio. This is intermediate between the textbook unaccelerated rate O(epsilon-1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\varepsilon <^>{-1})$$\end{document} and the accelerated rate O(epsilon-1/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\varepsilon <^>{-1/2})$$\end{document} due to Nesterov in 1983. The Silver Stepsize Schedule is a simple explicit fractal: the i-th stepsize is 1+rho nu(i)-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 + \rho <^>{\nu (i)-1}$$\end{document} where nu(i)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu (i)$$\end{document} is the 2-adic valuation of i. The design and analysis are conceptually identical to the strongly convex setting in our companion paper, but simplify remarkably in this specific setting.