Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

成果类型:
Article
署名作者:
Cayci, Semih; Satpathi, Siddhartha; He, Niao; Srikant, R.
署名单位:
University of Illinois System; University of Illinois Urbana-Champaign; RWTH Aachen University; University of Illinois System; University of Illinois Urbana-Champaign; Mayo Clinic; Swiss Federal Institutes of Technology Domain; ETH Zurich; University of Illinois System; University of Illinois Urbana-Champaign
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2023.3234234
发表日期:
2023
页码:
2891-2905
关键词:
Neural networks Approximation algorithms Markov processes CONVERGENCE Complexity theory Reinforcement Learning kernel reinforcement learning (RL) stochastic approximation temporal-difference (TD) learning
摘要:
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.