Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

成果类型:
Article
署名作者:
Feinberg, Eugene A.; Kasyanov, Pavlo O.; Zgurovsky, Michael Z.
署名单位:
State University of New York (SUNY) System; Stony Brook University; Ministry of Education & Science of Ukraine; Igor Sikorsky Kyiv Polytechnic Institute; National Academy of Sciences Ukraine; Institute for Applied System Analysis of the National Technical University of Ukraine Igor Sikorsky Kyiv Polytechnic Institute; Ministry of Education & Science of Ukraine; Igor Sikorsky Kyiv Polytechnic Institute
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2015.0746
发表日期:
2016
页码:
656-681
关键词:
observed inventory systems incomplete information state CONVERGENCE optimality THEOREM MODEL
摘要:
This paper describes sufficient conditions for the existence of optimal policies for partially observable Markov decision processes (POMDPs) with Borel state, observation, and action sets, when the goal is to minimize the expected total costs over finite or infinite horizons. For infinite-horizon problems, one-step costs are either discounted or assumed to be nonnegative. Action sets may be noncompact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semicontinuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to completely observable Markov decision processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above-mentioned optimality properties for COMDPs. The central question is whether the transition probabilities for the COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov decision process are weakly continuous and observation probabilities for the POMDP are continuous in total variation. Moreover, the continuity in total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.
来源URL: