Convergent multiple-timescales reinforcement learning algorithms in normal form games

成果类型:
Article
署名作者:
Leslie, DS; Collins, EJ
署名单位:
University of Bristol
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
发表日期:
2003
页码:
1231-1251
关键词:
stochastic-approximation mixed equilibria DYNAMICS
摘要:
We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudo-trajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge-in particular, we consider the N-player matching pennies game and Shapley's variant of the rock-scissors-paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.