您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2024 > 12期

The Projected Bellman Equation in Reinforcement Learning

成果类型：

Article

署名作者：

Meyn, Sean

署名单位：

State University System of Florida; University of Florida

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2024.3409647

发表日期：

2024

页码：

8323-8337

关键词：

Q-learning mathematical models training Approximation algorithms vectors STANDARDS Function approximation Markov processes optimal control Reinforcement Learning

摘要：

Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. In the original tabular formulation, the goal is to compute exactly a solution to the discounted-cost optimality equation, and thereby, obtain the optimal policy for a Markov Decision Process. The goal today is more modest: obtain an approximate solution within a prescribed function class. The standard algorithms are based on the same architecture as formulated in the 1980s, with the goal of finding a value function approximation that solves the so-called projected Bellman equation. While reinforcement learning has been an active research area for over four decades, there is little theory providing conditions for convergence of these Q-learning algorithms, or even existence of a solution to this equation. The purpose of this article is to show that a solution to the projected Bellman equation does exist, provided the function class is linear and the input used for training is a form of $\varepsilon$-greedy policy with sufficiently small $\varepsilon$. Moreover, under these conditions it is shown that the Q-learning algorithm is stable, in terms of bounded parameter estimates. Convergence remains one of many open topics for research.