您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2022 > 10期

Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications

成果类型：

Article

署名作者：

Wen, Min; Topcu, Ufuk

署名单位：

University of Pennsylvania; University of Texas System; University of Texas Austin

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2021.3115080

发表日期：

2022

页码：

5055-5070

关键词：

Formal verification Machine learning algorithms Statistical learning

摘要：

Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this article, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a subset of linear temporal logic (LTL) specifications, and quantitatively as a reward function. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Buchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is epsilon-optimal for the discounted-sum objective for any arbitrary positive epsilon. We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with a priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives.