FINITE-MEMORY SUBOPTIMAL DESIGN FOR PARTIALLY OBSERVED MARKOV DECISION-PROCESSES

成果类型:
Article
署名作者:
WHITE, CC; SCHERER, WT
署名单位:
University of Michigan System; University of Michigan; University of Virginia
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.42.3.439
发表日期:
1994
页码:
439-455
关键词:
摘要:
We develop bounds on the value function and a suboptimal design for the partially observed Markov decision process. These bounds and suboptimal design are based on the M most recent observations and actions. An a priori measure of the quality of these bounds is given. We show that larger M implies tighter bounds. An operations count analysis indicates that (#A#Z)M+1(#S) multiplications and additions are required per successive approximations iteration of the suboptimal design algorithm, where A, Z, and S are the action, observation, and state spaces, respectively, suggesting the algorithm is of potential use for problems with large state spaces. A preliminary numerical study indicates that the quality of the suboptimal design can be excellent.