您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2016 > 6期

Decomposable Markov Decision Processes: A Fluid Optimization Approach

成果类型：

Article

署名作者：

Bertsimas, Dimitris; Misic, Velibor V.

署名单位：

Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); University of California System; University of California Los Angeles

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2016.1531

发表日期：

2016

页码：

1537-1555

关键词：

linear-programming approach relaxations performance

摘要：

Decomposable Markov decision processes (MDPs) are problems where the stochastic system can be decomposed into multiple individual components. Although such MDPs arise naturally in many practical applications, they are often difficult to solve exactly due to the enormous size of the state space of the complete system, which grows exponentially with the number of components. In this paper, we propose an approximate solution approach to decomposable MDPs that is based on re-solving a fluid linear optimization formulation of the problem at each decision epoch. This formulation tractably approximates the problem by modeling transition behavior at the level of the individual components rather than the complete system. We prove that our fluid formulation provides a tighter bound on the optimal value function than three state-of-the-art formulations: the approximate linear optimization formulation, the classical Lagrangian relaxation formulation, and a novel, alternate Lagrangian relaxation that is based on relaxing an action consistency constraint. We provide a numerical demonstration of the effectiveness of the approach in the area of multiarmed bandit problems, where we show that our approach provides near optimal performance and outperforms state-of-the-art algorithms.

来源URL：

访问原文