How does the value function of a Markov decision process depend on the transition probabilities?

成果类型:
Article
署名作者:
Muller, A
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.22.4.872
发表日期:
1997
页码:
872-885
关键词:
respect
摘要:
The present work deals with the comparison of (discrete time) Markov decision processes (MDPs), which differ only in their transition probabilities. We show that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations. We also find conditions for continuity with respect to suitable probability metrics. The results are applied to some well-known examples, including inventory control and optimal stopping.