您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2023 > 7期

Exploiting Partial Observability for Optimal Deception

成果类型：

Article

署名作者：

Karabag, Mustafa O.; Ornik, Melkior; Topcu, Ufuk

署名单位：

University of Texas System; University of Texas Austin; University of Illinois System; University of Illinois Urbana-Champaign; University of Illinois System; University of Illinois Urbana-Champaign; University of Texas System; University of Texas Austin; University of Texas System; University of Texas Austin

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2022.3209959

发表日期：

2023

页码：

4443-4450

关键词：

Index Terms-Computational complexity deception under par-tial observations Markov decision processes (MDPs)

摘要：

Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent's behavior. The agent instead follows a different deceptive policy to achieve a different task. We model the environment with a Markov decision process and study the synthesis of optimal deceptive policies under partial observability. We formalize the notion of deception as a hypothesis testing problem and show that the synthesis of optimal deceptive policies is nondeterministic polynomial-time hard (NP-hard). As an approximation, we consider the class of mixture policies, which provides a convex optimization formulation of the deception problem. We give an algorithm that converges to the optimal mixture policy. We also consider a special class of Markov decision processes where the transition and observation functions are deterministic. For this case, we give a randomized algorithm for path planning that generates a path for the agent in polynomial time and achieves the optimal value for the considered objective function.