Constrained Cross-Entropy Method for Safe Reinforcement Learning
成果类型:
Article
署名作者:
Wen, Min; Topcu, Ufuk
署名单位:
University of Pennsylvania; Alphabet Inc.; Google Incorporated; University of Texas System; University of Texas Austin
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2020.3015931
发表日期:
2021
页码:
3123-3137
关键词:
trajectory
optimization
safety
linear programming
Mathematical model
CONVERGENCE
optimal control
Machine learning algorithms
safe reinforcement learning
Statistical learning
摘要:
We study a safe reinforcement learning problem, in which the constraints are defined as the expected cost over finite-length trajectories. We propose a constrained cross-entropy-based method to solve this problem. The key idea is to transform the original constrained optimization problem into an unconstrained one with a surrogate objective. The method explicitly tracks its performance with respect to constraint satisfaction and thus is well suited for safety-critical applications. We show that the asymptotic behavior of the proposed algorithm can be almost-surely described by that of an ordinary differential equation. Then, we give sufficient conditions on the properties of this differential equation for the convergence of the proposed algorithm. At last, we show the performance of the proposed algorithm in two simulation examples. In a constrained linear-quadratic regulator example, we observe that the algorithm converges to the global optimum with high probability. In a 2-D navigation example, we find that the algorithm effectively learns feasible policies without assumptions on the feasibility of initial policies, even with non-Markovian objective functions and constraint functions.