您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2025

Square-Root Regret Bounds for Continuous-Time Episodic Markov Decision Processes

成果类型：

Article; Early Access

署名作者：

Gao, Xuefeng; Zhou, Xunyu

署名单位：

Chinese University of Hong Kong; Columbia University

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2022.0283

发表日期：

2025

关键词：

finite-horizon state

摘要：

We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the intertransition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state-action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst case expected regret for the proposed algorithm and establish a worst case lower bound with both bounds of the order of square root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.