您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2016 > 2期

Empirical Dynamic Programming

成果类型：

Article

署名作者：

Haskell, William B.; Jain, Rahul; Kalathil, Dileep

署名单位：

National University of Singapore; University of Southern California; University of Southern California; University of Southern California; University of California System; University of California Berkeley

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2015.0733

发表日期：

2016

页码：

402-429

关键词：

markov decision-processes learning algorithms CONVERGENCE

摘要：

We propose empirical dynamic programming algorithms for Markov decision processes. In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get empirical value iteration (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get empirical policy iteration (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.