您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2022 > 9期

A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games

成果类型：

Article

署名作者：

Diddigi, Raghuram Bharadwaj; Kamanchi, Chandramouli; Bhatnagar, Shalabh

署名单位：

Indian Institute of Science (IISC) - Bangalore

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2022.3159453

发表日期：

2022

页码：

4816-4823

关键词：

games Q-learning game theory Markov processes CONVERGENCE STANDARDS computational modeling Minimax Q-learning successive relaxation two-player zero-sum games

摘要：

We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in this article. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this article, we compute the solution of the two-player zero-sum game, utilizing the technique of successive relaxation that has been successfully applied in this article to compute a faster value iteration algorithm in the context of Markov decision processes. We extend the concept of successive relaxation to the setting of two-player zero-sum games. We show that, under a special structure on the game, this technique facilitates faster computation of the min-max value of the states. We then derive a generalized minimax Q-learning algorithm, which computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm utilizing stochastic approximation techniques, under an assumption on the boundedness of iterates. Through experiments, we demonstrate the

来源URL：

访问原文