Constrained Online Convex Optimization With Feedback Delays
成果类型:
Article
署名作者:
Cao, Xuanyu; Zhang, Junshan; Poor, H. Vincent
署名单位:
University of Illinois System; University of Illinois Urbana-Champaign; Arizona State University; Arizona State University-Tempe; Princeton University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2020.3030743
发表日期:
2021
页码:
5049-5064
关键词:
Delays
optimization
power systems
Convex functions
Time factors
Benchmark testing
Collaborative work
Bandit feedback
constrained optimization
feedback delay
function feedback
online convex optimization (OCO)
摘要:
In this article, we study constrained online convex optimization (OCO) in the presence of feedback delays, where a decision maker chooses sequential actions without knowing the loss functions and constraint functions a priori. The loss/constraint functions vary with time and their feedback information is revealed to the decision maker with delays, which arise in many applications. We first consider the scenario of delayed function feedback, in which the complete information of the loss/constraint functions is revealed to the decision maker with delays. We develop a modified online saddle point algorithm suitable for constrained OCO with feedback delays. Sublinear regret and sublinear constraint violation bounds are established for the algorithm in terms of the delays. In practice, the complete information (functional forms) of the loss/constraint functions may not be revealed to the decision maker. Thus, we further examine the scenario of delayed bandit feedback, where only the values of the loss/constraint functions at two random points close to the chosen action are revealed to the decision maker with delays. A delayed version of the bandit online saddle point algorithm is proposed by utilizing stochastic gradient estimates of the loss/constraint functions based on delayed bandit feedback. We also establish sublinear regret and sublinear constraint violation bounds for this bandit optimization algorithm in terms of the delays. Finally, numerical results for online quadratically constrained quadratic programs are presented to corroborate the efficacy of the proposed algorithms.