您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2024 > 4期

A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

成果类型：

Article

署名作者：

Chen, Zaiwei; Maguluri, Siva T.; Shakkottai, Sanjay; Shanmugam, Karthikeyan

署名单位：

University System of Georgia; Georgia Institute of Technology; University of Texas System; University of Texas Austin

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2022.0249

发表日期：

2024

页码：

1352-1367

关键词：

time analysis optimization scheme

摘要：

This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD(lambda), off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD(lambda), we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy V trace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit.

来源URL：

访问原文