On the empirical state-action frequencies in Markov decision processes under general policies

成果类型:
Article
署名作者:
Mannor, S; Tsitsiklis, JN
署名单位:
McGill University; Massachusetts Institute of Technology (MIT)
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.1050.0148
发表日期:
2005
页码:
545-561
关键词:
large deviations
摘要:
We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distance between the empirical frequency vector and the polytope decays exponentially with time under every policy. We provide similar results for vector-valued empirical rewards.