您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2021

Quantile Markov Decision Processes

成果类型：

Article; Early Access

署名作者：

Li, Xiaocheng; Zhong, Huaiyang; Brandeau, Margaret L.

署名单位：

Stanford University

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2021.2123

发表日期：

2021

关键词：

value-at-risk time approximations regression therapy

摘要：

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of an MDP, which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, in which patients aim to balance the potential benefits and risks of the treatment.

来源URL：

访问原文