A WEIGHTED MARKOV DECISION-PROCESS

成果类型:
Article
署名作者:
KRASS, D; FILAR, JA; SINHA, SS
署名单位:
University System of Maryland; University of Maryland Baltimore County; Indian Statistical Institute; Indian Statistical Institute Delhi
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.40.6.1180
发表日期:
1992
页码:
1180-1187
关键词:
摘要:
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to neglect the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in tum can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an epsilon-optimal nonstationary policy with a very simple structure.