您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Management Science > 2023 > 11期

Approximation Benefits of Policy Gradient Methods with Aggregated States

成果类型：

Article

署名作者：

Russo, Daniel

署名单位：

Columbia University

刊物名称：

MANAGEMENT SCIENCE

ISSN/ISSBN：

0025-1909

DOI：

10.1287/mnsc.2023.4788

发表日期：

2023

页码：

6898-6911

关键词：

Reinforcement learning Approximate Dynamic Programming policy gradient methods state aggregation

摘要：

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by epsilon, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as epsilon/(1-gamma), where. is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust.