Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

成果类型:
Article
署名作者:
Chatterjee, Krishnendu; Saona, Raimundo; Ziliotto, Bruno
署名单位:
Institute of Science & Technology - Austria; Universite PSL; Universite Paris-Dauphine; Centre National de la Recherche Scientifique (CNRS)
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2020.1116
发表日期:
2022
页码:
100-119
关键词:
markov decision-processes games
摘要:
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.