New Versions of Gradient Temporal-Difference Learning
成果类型:
Article
署名作者:
Lee, Donghwan; Lim, Han-Dong; Park, Jihoon; Choi, Okyong
署名单位:
Korea Advanced Institute of Science & Technology (KAIST)
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2022.3213763
发表日期:
2023
页码:
5006-5013
关键词:
convergence
optimization
reinforcement learning (RL)
saddle-point problem
STABILITY
temporal-difference (TD) learning
摘要:
Sutton, Szepesvari and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this article is 1) to propose some variants of GTDs with extensive comparative analysis and 2) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate the new approaches.