Dynamic Regret of Distributed Online Saddle Point Problem

成果类型:
Article
署名作者:
Zhang, Wentao; Shi, Yang; Zhang, Baoyong; Yuan, Deming; Xu, Shengyuan
署名单位:
Nanjing University of Science & Technology; University of Victoria
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2023.3312033
发表日期:
2024
页码:
2522-2529
关键词:
Heuristic algorithms optimization games estimation decision making Upper bound uncertainty Bandit feedback Distributed online optimization dynamic regret multiagent systems saddle point problem (SPP)
摘要:
This article is concerned with the distributed online saddle point problem for multiagent networks over time-varying graphs. The objective is to design distributed online algorithms for optimizing the accumulation of the time-varying loss functions, and further to analyze the efficiency of the algorithms by utilizing the so-called dynamic regret as a performance measure. To this end, two kinds of distributed online optimization algorithms are developed by considering full-information feedback and bandit feedback, respectively. Under some certain assumptions and resorting to appropriate diminishing step sizes, it is proved that the two developed algorithms yield upper bounds of the dynamic regret in order O(max{T-lambda 1,T-lambda 2(V-T+1)})and or-derO(max{kT(lambda 3),kT(lambda 4)(V-T+1)}), respectively, where T is the horizon (total iteration number) of the algorithm, V-T is the path-variation level, k is the maximum dimension of the state variables, and lambda(i), i=1,2,3,4$, are positive tuning parameters less than 1. Clearly, both of the two developed algorithms can guarantee that the considered dynamic regret is sublinear with respect to T, provided that the path-variation level V(T )is sublinear with respect to $T$. This theoretically shows the efficiency of the developed algorithms. Moreover, the efficiency of the developed algorithms is also illustrated numerically by taking an example of a specific bilinear matrix game problem.