The Projected Bellman Equation in Reinforcement Learning

成果类型:
Article
署名作者:
Meyn, Sean
署名单位:
State University System of Florida; University of Florida
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3409647
发表日期:
2024
页码:
8323-8337
关键词:
Q-learning mathematical models training Approximation algorithms vectors STANDARDS Function approximation Markov processes optimal control Reinforcement Learning
摘要:
Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. In the original tabular formulation, the goal is to compute exactly a solution to the discounted-cost optimality equation, and thereby, obtain the optimal policy for a Markov Decision Process. The goal today is more modest: obtain an approximate solution within a prescribed function class. The standard algorithms are based on the same architecture as formulated in the 1980s, with the goal of finding a value function approximation that solves the so-called projected Bellman equation. While reinforcement learning has been an active research area for over four decades, there is little theory providing conditions for convergence of these Q-learning algorithms, or even existence of a solution to this equation. The purpose of this article is to show that a solution to the projected Bellman equation does exist, provided the function class is linear and the input used for training is a form of $\varepsilon$-greedy policy with sufficiently small $\varepsilon$. Moreover, under these conditions it is shown that the Q-learning algorithm is stable, in terms of bounded parameter estimates. Convergence remains one of many open topics for research.