您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 运营管理 > Operations Research > 2005 > 1期

An adaptive sampling algorithm for solving Markov decision processes

成果类型：

Article

署名作者：

Chang, HS; Fu, MC; Hu, JQ; Marcus, SI

署名单位：

Sogang University; University System of Maryland; University of Maryland College Park; University System of Maryland; University of Maryland College Park; University System of Maryland; University of Maryland College Park

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.1040.0145

发表日期：

2005

页码：

126-139

关键词：

摘要：

Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (lnN)/N, where N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O((vertical bar A vertical bar N)(H)), independent of the size of the state space, where vertical bar A vertical bar is the size of the action space and H is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.