Universal heavy-ball method for nonconvex optimization under Hölder continuous Hessians

成果类型:
Article
署名作者:
Marumo, Naoki; Takeda, Akiko
署名单位:
University of Tokyo; RIKEN
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-024-02100-4
发表日期:
2025
页码:
147-175
关键词:
regularized newton methods algorithms complexity BEHAVIOR
摘要:
We propose a new first-order method for minimizing nonconvex functions with Lipschitz continuous gradients and H & ouml;lder continuous Hessians. The proposed algorithm is a heavy-ball method equipped with two particular restart mechanisms. It finds a solution where the gradient norm is less than epsilon \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} in O ( H nu 1 2 + 2 nu epsilon - 4 + 3 nu 2 + 2 nu ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(H_{\nu }<^>{\frac{1}{2 + 2 \nu }} \varepsilon <^>{- \frac{4 + 3 \nu }{2 + 2 \nu }})$$\end{document} function and gradient evaluations, where nu is an element of [ 0 , 1 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \in [0, 1]$$\end{document} and H nu \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{\nu }$$\end{document} are the H & ouml;lder exponent and constant, respectively. This complexity result covers the classical bound of O ( epsilon - 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\varepsilon <^>{-2})$$\end{document} for nu = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu = 0$$\end{document} and the state-of-the-art bound of O ( epsilon - 7 / 4 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\varepsilon <^>{-7/4})$$\end{document} for nu = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu = 1$$\end{document} . Our algorithm is nu \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document} -independent and thus universal; it automatically achieves the above complexity bound with the optimal nu is an element of [ 0 , 1 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \in [0, 1]$$\end{document} without knowledge of H nu \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{\nu }$$\end{document} . In addition, the algorithm does not require other problem-dependent parameters as input, including the gradient's Lipschitz constant or the target accuracy epsilon \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} . Numerical results illustrate that the proposed method is promising.