您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2022 > 12期

Entropy Maximization for Partially Observable Markov Decision Processes

成果类型：

Article

署名作者：

Savas, Yagiz; Hibbard, Michael; Wu, Bo; Tanaka, Takashi; Topcu, Ufuk

署名单位：

University of Texas System; University of Texas Austin

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2022.3183564

发表日期：

2022

页码：

6948-6955

关键词：

Autonomous systems entropy Stochastic processes

摘要：

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent's trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. Focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parameteric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In a numerical example, we highlight the benefit of using an entropy-maximizing FSC compared with an FSC that simply finds a feasible policy for accomplishing a task.