Generalized Policy Improvement Algorithms With Theoretically Supported Sample Reuse

成果类型:
Article
署名作者:
Queeney, James; Paschalidis, Ioannis Ch.; Cassandras, Christos G.
署名单位:
Boston University; Boston University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3454011
发表日期:
2025
页码:
1236-1243
关键词:
approximation algorithms optimization tv training Task analysis trajectory Heuristic algorithms Policy improvement policy optimization reinforcement learning (RL) sample reuse
摘要:
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse, addressing a tradeoff between two important deployment requirements for real-world control: 1) practical performance guarantees; and 2) data efficiency. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a broad range of simulated control tasks.