Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications
成果类型:
Article
署名作者:
Wen, Min; Topcu, Ufuk
署名单位:
University of Pennsylvania; University of Texas System; University of Texas Austin
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2021.3115080
发表日期:
2022
页码:
5055-5070
关键词:
Formal verification
Machine learning algorithms
Statistical learning
摘要:
Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this article, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a subset of linear temporal logic (LTL) specifications, and quantitatively as a reward function. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Buchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is epsilon-optimal for the discounted-sum objective for any arbitrary positive epsilon. We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with a priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives.