Square-Root Regret Bounds for Continuous-Time Episodic Markov Decision Processes
成果类型:
Article; Early Access
署名作者:
Gao, Xuefeng; Zhou, Xunyu
署名单位:
Chinese University of Hong Kong; Columbia University
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2022.0283
发表日期:
2025
关键词:
finite-horizon
state
摘要:
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the intertransition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state-action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst case expected regret for the proposed algorithm and establish a worst case lower bound with both bounds of the order of square root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.