Asynchronous Parallel Policy Gradient Methods for the Linear Quadratic Regulator

成果类型:
Article
署名作者:
Zhao, Feiran; Sha, Xingyu; You, Keyou
署名单位:
Tsinghua University; Tsinghua University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2025.3543128
发表日期:
2025
页码:
4920-4927
关键词:
convergence vectors COSTS accuracy regulators gradient methods DELAYS training iterative methods Data mining Asynchronous parallel methods linear quadratic regulator (LQR) linear system Policy gradient (PG)
摘要:
Learning policies in an asynchronous parallel way is essential to numerous successes of reinforcement learning for solving complex problems. However, their convergence has not been rigorously evaluated. To improve the theoretical understanding, we adopt the asynchronous parallel zero-order policy gradient (AZOPG) method to solve the continuous-time linear quadratic regulation problem. Specifically, multiple workers independently perform system rollouts to estimate zero-order policy gradients (PGs), which are then aggregated in a central node for policy updates. Moreover, each worker is allowed to interact with the central node asynchronously, leading to delayed PG estimates. By quantifying the convergence rate of AZOPG, we show a linear speedup property both in theory and simulation, which reveals the advantages of using asynchronous parallel workers in learning policies.