Approximate Policy Iteration for Robust Stochastic Control of Multiagent Markov Decision Processes
成果类型:
Article
署名作者:
Huang, Feng; Cao, Ming; Wang, Long
署名单位:
Peking University; University of Groningen
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3510596
发表日期:
2025
页码:
3587-3602
关键词:
games
uncertainty
CONVERGENCE
Stochastic processes
Heuristic algorithms
Approximation algorithms
Multi-agent systems
vectors
Markov decision processes
Artificial intelligence
Markov decision processes (MDPs)
multiagent learning
robust stochastic control
sequential social dilemmas
摘要:
In stochastic dynamic environments, multiagent Markov decision processes have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multiagent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To investigate the sensitivity of optimal policies to these uncertain parameters, we study a robust stochastic control problem of multiagent Markov decision processes where all agents constitute a centralized controller whose goal is to seek a maximal long-term return of all agents and the uncertainty plays a role of disturbance for achieving this goal, and provide a solution concept of robust team optimality for decisions of all agents. To seek such a solution, we develop a robust iterative learning algorithm of policies for all agents and present its convergence analysis. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate required computational resources. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by extending the model of sequential social dilemmas to uncertain scenarios.
来源URL: