您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2023 > 5期

Safe Value Functions

成果类型：

Article

署名作者：

Massiani, Pierre-Francois; Heim, Steve; Solowjow, Friedrich; Trimpe, Sebastian

署名单位：

RWTH Aachen University; Max Planck Society; Massachusetts Institute of Technology (MIT)

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2022.3200948

发表日期：

2023

页码：

2743-2757

关键词：

safety Task analysis trajectory dynamical systems Reinforcement Learning optimal control kernel reinforcement learning (RL) safety Strong Duality value functions viability

摘要：

Safety constraints and optimality are important but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning (RL) to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine the relationship of both safety and optimality to penalties, and formalize sufficient conditions for safe value functions (SVFs): value functions that are both optimal for a given task, and enforce safety constraints. We reveal this structure by examining when rewards preserve viability under optimal control, and show that there always exists a finite penalty that induces an SVF. This penalty is not unique, but upper-unbounded: larger penalties do not harm optimality. Although it is often not possible to compute the minimum required penalty, we reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for control problems where safety is important.