您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 运营管理 > Operations Research > 2021

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

成果类型：

Article; Early Access

署名作者：

Cen, Shicong; Cheng, Chen; Chen, Yuxin; Wei, Yuting; Chi, Yuejie

署名单位：

Carnegie Mellon University; Stanford University; Princeton University; University of Pennsylvania

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2021.2151

发表日期：

2021

关键词：

Reinforcement learning natural policy gradient methods entropy regularization GLOBAL CONVERGENCE

摘要：

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization-an algorithmic scheme that encourages exploration-and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly-even quadratically, once it enters a local region around the optimal policy-when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-a-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates and shed light upon the role of entropy regularization in enabling fast convergence.

来源URL：

访问原文