您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2020 > 2期

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

成果类型：

Article

署名作者：

Wang, Mengdi

署名单位：

Princeton University

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2019.1000

发表日期：

2020

页码：

517-546

关键词：

Complexity

摘要：

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value-policy duality, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an f-optimal policy using nearly linear runtime in the worst case for a fixed value of the discount factor. When the Markov decision process is ergodic and specified in some special data formats, for fixed values of certain ergodicity parameters, the algorithm finds an c-optimal policy using sample size and time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.

来源URL：

访问原文