您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2021 > 3期

Stochastic Approximation for Risk-Aware Markov Decision Processes

成果类型：

Article

署名作者：

Huang, Wenjie; Haskell, William B.

署名单位：

Shenzhen Research Institute of Big Data; The Chinese University of Hong Kong, Shenzhen; The Chinese University of Hong Kong, Shenzhen; Purdue University System; Purdue University

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2020.2989702

发表日期：

2021

页码：

1314-1320

关键词：

Markov decision processes (MDPs) risk measure saddle point stochastic approximation Q-learning

摘要：

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance epsilon > 0 for optimal Q-value estimation gap and learning rate k is an element of (1/2, 1], the overall convergence rate of our algorithm is Omega((ln(1/delta epsilon)/epsilon(2))(1/k) + (ln(1/epsilon))(1/(1-k))) with probability at least 1-delta.