您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2020 > 2期

On the Taylor Expansion of Value Functions

成果类型：

Article

署名作者：

Braverman, Anton; Gurvich, Itai; Huang, Junfei

署名单位：

Northwestern University; Chinese University of Hong Kong

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2019.1903

发表日期：

2020

页码：

631-654

关键词：

approximations optimality

摘要：

We introduce a framework for approximate dynamic programming that we apply to discrete-time chains on Z(+)(d) with countable action sets. The framework is grounded in the approximation of the (controlled) chain's generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function, replacing the Bellman equation with one in continuous space and time in which the transition matrix is reduced to its first and second moments. In some cases, the resulting equation can be interpreted as a Hamilton-Jacobi-Bellman equation for a Brownian control problem. When tractable, the Taylored equation serves as a useful modeling tool. More generally, it is a starting point for approximation algorithms. We develop bounds on the optimality gap-the suboptimality introduced by using the control produced by the Taylored equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian approximations. We prove that under suitable conditions and for suitably large initial states, (1) the optimality gap is smaller than a 1-alpha fraction of the optimal value, with which alpha is an element of (0, 1) is the discount factor, and (2) the gap can be further expressed as the infinite-horizon discounted value with a lowerorder per-period reward. Computationally, our framework leads to an aggregation approach with performance guarantees. Although the guarantees are grounded in partial differential equation theory, the practical use of this approach requires no knowledge of that theory.

来源URL：

访问原文