您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2024 > 9期

Boosting One-Point Derivative-Free Online Optimization via Residual Feedback

成果类型：

Article

署名作者：

Zhang, Yan; Zhou, Yi; Ji, Kaiyi; Shen, Yi; Zavlanos, Michael M.

署名单位：

Duke University; Utah System of Higher Education; University of Utah; University System of Ohio; Ohio State University

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2024.3382358

发表日期：

2024

页码：

6309-6316

关键词：

Optimization linear programming COSTS CONVERGENCE Reinforcement Learning Heuristic algorithms vectors online optimization Regret Analysis zeroth-order optimization (ZO)

摘要：

Zeroth-order optimization (ZO) typically relies on two-point feedback to estimate the gradient of the objective function. Nevertheless, two-point feedback cannot be used for online optimization with time-varying objective functions, where only a single query of the function value is possible at each time step. In this work, we propose a new one-point feedback method for online optimization that estimates the gradient using the residual between two feedback points at consecutive time instants. Moreover, we develop regret bounds for ZO with residual feedback for constrained convex and unconstrained nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one-point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. In addition, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods. Numerical experiments show that ZO with residual feedback significantly outperforms existing one-point feedback methods.