您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2024 > 6期

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

成果类型：

Article

署名作者：

Yan, Yuling; Li, Gen; Chen, Yuxin; Fan, Jianqing

署名单位：

Massachusetts Institute of Technology (MIT); Chinese University of Hong Kong; University of Pennsylvania; Princeton University

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2022.0342

发表日期：

2024

页码：

2430-2445

关键词：

sample complexity performance bounds LEVEL

摘要：

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Specifically, consider a gamma-discounted, infinite-horizon Markov game with S states, in which the max-player has A actions and the min-player has B actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds-called the value iteration with lower confidence bounds for zero-sum Markov games-that provably finds an epsilon-approximate Nash equilibrium with a sample complexity no larger than C-clipped(star) S(A+B)/(1-gamma)(3) epsilon(2) (up to some log factor). Here, C-clipped(star) is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-a-vis the target data), and the target accuracy epsilon can be any value within (0, 1/1-gamma]. . Our sample complexity bound strengthens prior art by a factor of min{A, B}, achieving minimax optimality for a broad regime of interest. An appealing feature of our result lies in its algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.