Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
成果类型:
Article
署名作者:
Shi, Chengchun; Luo, Shikai; Le, Yuan; Zhu, Hongtu; Song, Rui
署名单位:
University of London; London School Economics & Political Science; Shanghai University of Finance & Economics; University of North Carolina; University of North Carolina Chapel Hill; North Carolina State University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2022.2106868
发表日期:
2024
页码:
232-245
关键词:
dynamic treatment regimes
Robust Estimation
inference
rates
摘要:
We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL.