Markov Decision Problems Where Means Bound Variances
成果类型:
Article
署名作者:
Arlotto, Alessandro; Gans, Noah; Steele, J. Michael
署名单位:
Duke University; University of Pennsylvania; University of Pennsylvania
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2014.1281
发表日期:
2014
页码:
864-875
关键词:
optimal sequential selection
undiscounted mdp
random-variables
random sample
monotone subsequences
optimality criterion
revenue management
sum constraint
tradeoffs
optimization
摘要:
We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of the optimal total reward can be bounded by a simple linear function of its expected value. The class is characterized by three natural properties: reward nonnegativity and boundedness, existence of a do-nothing action, and optimal action monotonicity. These properties are commonly present and typically easy to check. Implications of the class properties and of the variance bound are illustrated by examples of MDPs from operations research, operations management, financial engineering, and combinatorial optimization.