您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2006 > 2期

Performance loss bounds for approximate value iteration with state aggregation

成果类型：

Article

署名作者：

Van Roy, Benjamin

署名单位：

Stanford University

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.1060.0188

发表日期：

2006

页码：

234-244

关键词：

convergence algorithm average

摘要：

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.