The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes
成果类型:
Article
署名作者:
Post, Ian; Ye, Yinyu
署名单位:
University of Waterloo; Stanford University
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2014.0699
发表日期:
2015
页码:
859-868
关键词:
Complexity
Iteration
摘要:
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the discount factor. For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n(3) m(2) log(2) n) iterations if the discount factor is uniform and O(n(5) m(3) log(2) n) iterations if each action has a distinct discount factor. Previously the simplex method was known to run in polynomial time only for discounted MDPs where the discount was bounded away from 1.