您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematical Programming > 2023 > 1期

Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes

成果类型：

Article

署名作者：

Lan, Guanghui

署名单位：

University System of Georgia; Georgia Institute of Technology

刊物名称：

MATHEMATICAL PROGRAMMING

ISSN/ISSBN：

0025-5610

DOI：

10.1007/s10107-022-01816-5

发表日期：

2023

页码：

1059-1106

关键词：

摘要：

We present new policy mirror descent (PMD) methods for solving reinforcement learning (RL) problems with either strongly convex or general convex regularizers. By exploring the structural properties of these overall highly nonconvex problems we show that the PMD methods exhibit fast linear rate of convergence to the global optimality. We develop stochastic counterparts of these methods, and establish an O(1/epsilon) (resp., O(1/epsilon(2))) sampling complexity for solving these RL problems with strongly (resp., general) convex regularizers using different sampling schemes, where epsilon denote the target accuracy. We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by O{(log(gamma) epsilon)[(1 - gamma)L/mu](1/2) log(1/epsilon)} (resp., O{(log(gamma) epsilon) (L/epsilon)(1/2)}) for problems with strongly (resp., general) convex regularizers. Here gamma denotes the discounting factor. To the best of our knowledge, these complexity bounds, along with our algorithmic developments, appear to be new in both optimization and RL literature. The introduction of these convex regularizers also greatly enhances the flexibility and thus expands the applicability of RL models.