您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2025 > 2期

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

成果类型：

Article

署名作者：

Reisinger, Christoph; Tam, Jonathan

署名单位：

University of Oxford

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2023.0172

发表日期：

2025

关键词：

variational-inequalities policy iteration quickest detection monotone systems HJB equations CONVERGENCE

摘要：

We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.