您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2023 > 4期

Dynamic Programming Principles for Mean-Field Controls with Learning

成果类型：

Article

署名作者：

Gu, Haotian; Guo, Xin; Wei, Xiaoli; Xu, Renyuan

署名单位：

University of California System; University of California Berkeley; University of California System; University of California Berkeley; Tsinghua Shenzhen International Graduate School; University of Southern California

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2022.2395

发表日期：

2023

页码：

1040-1054

关键词：

model time

摘要：

The dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and, more recently, mean-field controls (MFCs). However, in the learning framework of MFCs, the DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where the DPP fails with a misspecified Q function and then propose the correct form of Q function in an appropriate space for MFCs with learning. This particular form of Q function is different from the classical one and is called the IQ function. In the special case when the transition probability and the reward are independent of the mean-field information, it integrates the classical Q function for single-agent RL over the state-action distribution. In other words, MFCs with learning can be viewed as lifting the classical RLs by replacing the state-action space with its probability distribution space. This identification of the IQ function enables us to establish precisely the DPP in the learning framework of MFCs. Finally, we illustrate through numerical experiments the time consistency of this IQ function.