您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2023 > 5期

Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

成果类型：

Article

署名作者：

Cayci, Semih; Satpathi, Siddhartha; He, Niao; Srikant, R.

署名单位：

University of Illinois System; University of Illinois Urbana-Champaign; RWTH Aachen University; University of Illinois System; University of Illinois Urbana-Champaign; Mayo Clinic; Swiss Federal Institutes of Technology Domain; ETH Zurich; University of Illinois System; University of Illinois Urbana-Champaign

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2023.3234234

发表日期：

2023

页码：

2891-2905

关键词：

Neural networks Approximation algorithms Markov processes CONVERGENCE Complexity theory Reinforcement Learning kernel reinforcement learning (RL) stochastic approximation temporal-difference (TD) learning

摘要：

In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.