您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2021 > 7期

Constrained Cross-Entropy Method for Safe Reinforcement Learning

成果类型：

Article

署名作者：

Wen, Min; Topcu, Ufuk

署名单位：

University of Pennsylvania; Alphabet Inc.; Google Incorporated; University of Texas System; University of Texas Austin

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2020.3015931

发表日期：

2021

页码：

3123-3137

关键词：

trajectory optimization safety linear programming Mathematical model CONVERGENCE optimal control Machine learning algorithms safe reinforcement learning Statistical learning

摘要：

We study a safe reinforcement learning problem, in which the constraints are defined as the expected cost over finite-length trajectories. We propose a constrained cross-entropy-based method to solve this problem. The key idea is to transform the original constrained optimization problem into an unconstrained one with a surrogate objective. The method explicitly tracks its performance with respect to constraint satisfaction and thus is well suited for safety-critical applications. We show that the asymptotic behavior of the proposed algorithm can be almost-surely described by that of an ordinary differential equation. Then, we give sufficient conditions on the properties of this differential equation for the convergence of the proposed algorithm. At last, we show the performance of the proposed algorithm in two simulation examples. In a constrained linear-quadratic regulator example, we observe that the algorithm converges to the global optimum with high probability. In a 2-D navigation example, we find that the algorithm effectively learns feasible policies without assumptions on the feasibility of initial policies, even with non-Markovian objective functions and constraint functions.