您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2023 > 3期

Provably Efficient Reinforcement Learning with Linear Function Approximation

成果类型：

Article

署名作者：

Jin, Chi; Yang, Zhuoran; Wang, Zhaoran; Jordan, Michael, I

署名单位：

Princeton University; Yale University; Northwestern University; University of California System; University of California Berkeley

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2022.1309

发表日期：

2023

页码：

1496-1521

关键词：

摘要：

Modern reinforcement learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially given the need to manage the exploration/exploitation trade-off. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate function approximation? This question persists even in a basic setting with linear dynamics and linear rewards, for which only linear function approximation is needed. This paper presents the first provable RL algorithm with both polynomial run time and polynomial sample complexity in this linear setting, without requiring a simulator or additional assumptions. Concretely, we prove that an optimistic modification of least-squares value iteration-a classical algorithm frequently studied in the linear setting-achieves (O) over tilde (root d(3)H(3)T) regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps. Importantly, such regret is independent of the number of states and actions.