您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2025 > 1期

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

成果类型：

Article

署名作者：

Moharrami, Mehrdad; Murthy, Yashaswini; Roy, Arghyadip; Srikant, R.

署名单位：

University of Iowa; University of Illinois System; University of Illinois Urbana-Champaign; University of Illinois System; University of Illinois Urbana-Champaign; Indian Institute of Technology System (IIT System); Indian Institute of Technology (IIT) - Guwahati

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2022.0139

发表日期：

2025

关键词：

formula

摘要：

We study the risk -sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory -based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risksensitive cost and show that this new cost criterion can be used to approximate the risksensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory -based gradient algorithm to minimize the smooth truncated estimation of the risk -sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.

来源URL：

访问原文