A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

成果类型:
Article
署名作者:
Moharrami, Mehrdad; Murthy, Yashaswini; Roy, Arghyadip; Srikant, R.
署名单位:
University of Iowa; University of Illinois System; University of Illinois Urbana-Champaign; University of Illinois System; University of Illinois Urbana-Champaign; Indian Institute of Technology System (IIT System); Indian Institute of Technology (IIT) - Guwahati
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2022.0139
发表日期:
2025
关键词:
formula
摘要:
We study the risk -sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory -based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risksensitive cost and show that this new cost criterion can be used to approximate the risksensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory -based gradient algorithm to minimize the smooth truncated estimation of the risk -sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.
来源URL: