Near-optimal no-regret algorithms for zero-sum games
成果类型:
Article
署名作者:
Daskalakis, Constantinos; Deckelbaum, Alan; Kim, Anthony
署名单位:
Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); Stanford University
刊物名称:
GAMES AND ECONOMIC BEHAVIOR
ISSN/ISSBN:
0899-8256
DOI:
10.1016/j.geb.2014.01.003
发表日期:
2015
页码:
327-348
关键词:
Zero-sum games
repeated games
learning
No-regret dynamics
摘要:
We propose a new no-regret learning algorithm. When used against an adversary, our algorithm achieves average regret that scales optimally as 0 (1/root T) with the number T of rounds. However, when our algorithm is used by both players of a zero-sum game, their average regret scales as 0 (In T/T), guaranteeing a near-linear rate of convergence to the value of the game. This repreSents an almost-quadratic improvement on the rate of convergence to the value of a zero-sum game known to be achievable by any no-regret learning algorithm. Moreover, it is essentially optimal as we also show a lower bound of Omega(1/T) for all distributed dynamics, as long as the players do not know their payoff matrices in the beginning of the dynamics. (If they did, they could privately compute minimax strategies and play them ad infinitum.) 2014 Elsevier Inc. All rights reserved.