您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the American Statistical Association > 2024 > 545期

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

成果类型：

Article

署名作者：

Shi, Chengchun; Luo, Shikai; Le, Yuan; Zhu, Hongtu; Song, Rui

署名单位：

University of London; London School Economics & Political Science; Shanghai University of Finance & Economics; University of North Carolina; University of North Carolina Chapel Hill; North Carolina State University

刊物名称：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

ISSN/ISSBN：

0162-1459

DOI：

10.1080/01621459.2022.2106868

发表日期：

2024

页码：

232-245

关键词：

dynamic treatment regimes Robust Estimation inference rates

摘要：

We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL.