您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2024 > 10期

Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States

成果类型：

Article

署名作者：

Yang, Xiangyu; Hu, Jiaqiao; Hu, Jian-Qiang

署名单位：

Shandong University; State University of New York (SUNY) System; Stony Brook University; Fudan University

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2024.3371380

发表日期：

2024

页码：

6546-6560

关键词：

Q-learning Approximation algorithms mathematical models Markov decision processes trajectory Prediction algorithms optimization Dynamic systems and control Markov processes online computation

摘要：

Markov decision processes (MDPs) are widely used for modeling sequential decision-making problems under uncertainty. We propose an online algorithm for solving a class of average-reward MDPs with continuous state spaces in a model-free setting. The algorithm combines the classical relative Q-learning with an asynchronous averaging procedure, which permits the Q-value estimate at a state-action pair to be updated based on observations at other neighboring pairs sampled in subsequent iterations. These point estimates are then retained and used for constructing an interpolation-based function approximator that predicts the Q-function values at unexplored state-action pairs. We show that with probability one the sequence of function approximators converges to the optimal Q-function up to a constant. Numerical results on a simple benchmark example are reported to illustrate the algorithm.