Approximate Policy Iteration for Robust Stochastic Control of Multiagent Markov Decision Processes

成果类型:
Article
署名作者:
Huang, Feng; Cao, Ming; Wang, Long
署名单位:
Peking University; University of Groningen
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3510596
发表日期:
2025
页码:
3587-3602
关键词:
games uncertainty CONVERGENCE Stochastic processes Heuristic algorithms Approximation algorithms Multi-agent systems vectors Markov decision processes Artificial intelligence Markov decision processes (MDPs) multiagent learning robust stochastic control sequential social dilemmas
摘要:
In stochastic dynamic environments, multiagent Markov decision processes have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multiagent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To investigate the sensitivity of optimal policies to these uncertain parameters, we study a robust stochastic control problem of multiagent Markov decision processes where all agents constitute a centralized controller whose goal is to seek a maximal long-term return of all agents and the uncertainty plays a role of disturbance for achieving this goal, and provide a solution concept of robust team optimality for decisions of all agents. To seek such a solution, we develop a robust iterative learning algorithm of policies for all agents and present its convergence analysis. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate required computational resources. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by extending the model of sequential social dilemmas to uncertain scenarios.
来源URL: