Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
成果类型:
Article; Early Access
署名作者:
Ba, Wenjia; Lin, Tianyi; Zhang, Jiawei; Zhou, Zhengyuan
署名单位:
University of British Columbia; Columbia University; New York University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2021.0445
发表日期:
2025
关键词:
convergence
descent
摘要:
We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time-determined by all players' current joint action-rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it root ffiffiffi achieves the single-agent optimal regret of Theta ( n T ) under smooth and strongly concave reward functions (n >= 1 is the problem dimension). We then show that if each player applies this no-regret learning algorithm in strongly monotone games, the joint action converges in the last iterate to the unique Nash equilibrium at a rate of Theta ( nT - 1 = 2 ) . Prior to our work, the best-known convergence rate in the same class of games is