Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

成果类型:
Article
署名作者:
Li, Gen; Wei, Yuting; Chi, Yuejie; Chen, Yuxin
署名单位:
University of Pennsylvania; Carnegie Mellon University; University of Pennsylvania
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2023.2451
发表日期:
2024
关键词:
Bounds rates
摘要:
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider gamma-discounted infinite-horizon Markov decision processes (MDPs) with state space S and action space A. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least | S||A| /(1-gamma)(2). The current paper overcomes this barrier by certifying the minimax optimality of two algorithms-a perturbed model-based algorithm and a conservative model-based algorithm-as soon as the sample size exceeds the order of | S||A |/ 1-gamma (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
来源URL: