Transient and asymptotic dynamics of reinforcement learning in games
成果类型:
Article
署名作者:
Izquierdo, Luis R.; Izquierdo, Segismundo S.; Gotts, Nicholas M.; Polhill, J. Gary
署名单位:
Universidad de Burgos; Universidad de Valladolid; James Hutton Institute
刊物名称:
GAMES AND ECONOMIC BEHAVIOR
ISSN/ISSBN:
0899-8256
DOI:
10.1016/j.geb.2007.01.005
发表日期:
2007
页码:
259-276
关键词:
Reinforcement learning
Bush and Mosteller
learning in games
stochastic approximation
Slow learning
distance diminishing
摘要:
Reinforcement learners tend to repeat actions that led to satisfactory outcomes in the past, and avoid choices that resulted in unsatisfactory experiences. This behavior is one of the most widespread adaptation mechanisms in nature. In this paper we fully characterize the dynamics of one of the best known stochastic models of reinforcement learning [Bush, R., Mosteller, F., 1955. Stochastic Models of Learning. Wiley & Sons, New York] for 2-player 2-strategy games. We also provide some extensions for more general games and for a wider class of learning algorithms. Specifically, it is shown that the transient dynamics of Bush and Mosteller's model can be substantially different from its asymptotic behavior. It is also demonstrated that in general-and in sharp contrast to other reinforcement learning models in the literature-the asymptotic dynamics of Bush and Mosteller's model cannot be approximated using the continuous time limit version of its expected motion. (C) 2007 Elsevier Inc. All rights reserved.
来源URL: