您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2023 > 8期

New Versions of Gradient Temporal-Difference Learning

成果类型：

Article

署名作者：

Lee, Donghwan; Lim, Han-Dong; Park, Jihoon; Choi, Okyong

署名单位：

Korea Advanced Institute of Science & Technology (KAIST)

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2022.3213763

发表日期：

2023

页码：

5006-5013

关键词：

convergence optimization reinforcement learning (RL) saddle-point problem STABILITY temporal-difference (TD) learning

摘要：

Sutton, Szepesvari and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this article is 1) to propose some variants of GTDs with extensive comparative analysis and 2) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate the new approaches.