Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time
成果类型:
Article
署名作者:
Wang, Mengdi
署名单位:
Princeton University
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2019.1000
发表日期:
2020
页码:
517-546
关键词:
Complexity
摘要:
We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value-policy duality, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an f-optimal policy using nearly linear runtime in the worst case for a fixed value of the discount factor. When the Markov decision process is ergodic and specified in some special data formats, for fixed values of certain ergodicity parameters, the algorithm finds an c-optimal policy using sample size and time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.
来源URL: