Generalized Second-Order Value Iteration in Markov Decision Processes
成果类型:
Article
署名作者:
Kamanchi, Chandramouli; Diddigi, Raghuram Bharadwaj; Bhatnagar, Shalabh
署名单位:
Indian Institute of Science (IISC) - Bangalore
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2021.3112851
发表日期:
2022
页码:
4241-4247
关键词:
Mathematical model
CONVERGENCE
Newton method
STANDARDS
Approximation algorithms
Markov processes
entropy
Markov Decision Process
value iteration
successive relaxation
摘要:
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov decision process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first-order method and, therefore, it may take a large number of iterations to converge to the optimal solution. Successive relaxation is a popular technique that can be applied to solve a fixed point equation. It has been shown in the literature that under a special structure of the MDP, successive overrelaxation technique computes the optimal value function faster than standard value iteration. In this article, we propose a second-order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second-order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.
来源URL: