您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2022 > 8期

Generalized Second-Order Value Iteration in Markov Decision Processes

成果类型：

Article

署名作者：

Kamanchi, Chandramouli; Diddigi, Raghuram Bharadwaj; Bhatnagar, Shalabh

署名单位：

Indian Institute of Science (IISC) - Bangalore

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2021.3112851

发表日期：

2022

页码：

4241-4247

关键词：

Mathematical model CONVERGENCE Newton method STANDARDS Approximation algorithms Markov processes entropy Markov Decision Process value iteration successive relaxation

摘要：

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov decision process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first-order method and, therefore, it may take a large number of iterations to converge to the optimal solution. Successive relaxation is a popular technique that can be applied to solve a fixed point equation. It has been shown in the literature that under a special structure of the MDP, successive overrelaxation technique computes the optimal value function faster than standard value iteration. In this article, we propose a second-order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second-order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.

来源URL：

访问原文