您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2024 > 1期

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

成果类型：

Article

署名作者：

Li, Gen; Wei, Yuting; Chi, Yuejie; Chen, Yuxin

署名单位：

University of Pennsylvania; Carnegie Mellon University; University of Pennsylvania

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2023.2451

发表日期：

2024

关键词：

Bounds rates

摘要：

This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider gamma-discounted infinite-horizon Markov decision processes (MDPs) with state space S and action space A. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least | S||A| /(1-gamma)(2). The current paper overcomes this barrier by certifying the minimax optimality of two algorithms-a perturbed model-based algorithm and a conservative model-based algorithm-as soon as the sample size exceeds the order of | S||A |/ 1-gamma (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).

来源URL：

访问原文