您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2025 > 6期

Approximate Policy Iteration for Robust Stochastic Control of Multiagent Markov Decision Processes

成果类型：

Article

署名作者：

Huang, Feng; Cao, Ming; Wang, Long

署名单位：

Peking University; University of Groningen

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2024.3510596

发表日期：

2025

页码：

3587-3602

关键词：

games uncertainty CONVERGENCE Stochastic processes Heuristic algorithms Approximation algorithms Multi-agent systems vectors Markov decision processes Artificial intelligence Markov decision processes (MDPs) multiagent learning robust stochastic control sequential social dilemmas

摘要：

In stochastic dynamic environments, multiagent Markov decision processes have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multiagent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To investigate the sensitivity of optimal policies to these uncertain parameters, we study a robust stochastic control problem of multiagent Markov decision processes where all agents constitute a centralized controller whose goal is to seek a maximal long-term return of all agents and the uncertainty plays a role of disturbance for achieving this goal, and provide a solution concept of robust team optimality for decisions of all agents. To seek such a solution, we develop a robust iterative learning algorithm of policies for all agents and present its convergence analysis. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate required computational resources. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by extending the model of sequential social dilemmas to uncertain scenarios.

来源URL：

访问原文