您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2016 > 4期

Reinforcement Learning in Robust Markov Decision Processes

成果类型：

Article

署名作者：

Lim, Shiau Hong; Xu, Huan; Mannor, Shie

署名单位：

National University of Singapore; Technion Israel Institute of Technology

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2016.0779

发表日期：

2016

页码：

1325-1353

关键词：

摘要：

An important challenge in Markov decision processes (MDP) is to ensure robustness with respect to unexpected or adversarial system behavior. A standard paradigm to tackle this challenge is the robust MDP framework that models the parameters as arbitrary elements of pre-defined uncertainty sets, and seeks the minimax policy-the policy that performs the best under the worst realization of the parameters in the uncertainty set. A crucial issue of the robust MDP framework, largely unaddressed in literature, is how to find appropriate description of the uncertainty in a principled data-driven way. In this paper we address this problem using an online learning approach: we devise an algorithm that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as well as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs. To the best of our knowledge, this is the first attempt to learn uncertainty in robust MDPs.

来源URL：

访问原文