您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2023 > 4期

Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

成果类型：

Article

署名作者：

Mou, Wenlong; Pananjady, Ashwin; Wainwright, Martin J.

署名单位：

University of California System; University of California Berkeley; University System of Georgia; Georgia Institute of Technology; Massachusetts Institute of Technology (MIT)

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2022.1341

发表日期：

2023

页码：

2308-2336

关键词：

stochastic-approximation bounds aggregation algorithms estimators

摘要：

Linear fixed-point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to com-pute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak-Ruppert averaging. This bound consists of two terms: an approximation error term with an instance -dependent approximation factor and a statistical error term that captures the instance -specific complexity of the noise when projected onto the low-dimensional subspace. Using information-theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.

来源URL：

访问原文