您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2025 > 7期

Asynchronous Parallel Policy Gradient Methods for the Linear Quadratic Regulator

成果类型：

Article

署名作者：

Zhao, Feiran; Sha, Xingyu; You, Keyou

署名单位：

Tsinghua University; Tsinghua University

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2025.3543128

发表日期：

2025

页码：

4920-4927

关键词：

convergence vectors COSTS accuracy regulators gradient methods DELAYS training iterative methods Data mining Asynchronous parallel methods linear quadratic regulator (LQR) linear system Policy gradient (PG)

摘要：

Learning policies in an asynchronous parallel way is essential to numerous successes of reinforcement learning for solving complex problems. However, their convergence has not been rigorously evaluated. To improve the theoretical understanding, we adopt the asynchronous parallel zero-order policy gradient (AZOPG) method to solve the continuous-time linear quadratic regulation problem. Specifically, multiple workers independently perform system rollouts to estimate zero-order policy gradients (PGs), which are then aggregated in a central node for policy updates. Moreover, each worker is allowed to interact with the central node asynchronously, leading to delayed PG estimates. By quantifying the convergence rate of AZOPG, we show a linear speedup property both in theory and simulation, which reveals the advantages of using asynchronous parallel workers in learning policies.