Convergence and Sample Complexity of Policy Gradient Methods for Stabilizing Linear Systems
成果类型:
Article
署名作者:
Zhao, Feiran; Fu, Xingyun; You, Keyou
署名单位:
Tsinghua University; Tsinghua University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3455508
发表日期:
2025
页码:
1455-1466
关键词:
complexity theory
COSTS
CONVERGENCE
Linear systems
trajectory
vectors
Search problems
Policy gradient (PG)
Sample Complexity
stabilization of linear systems
the discounted linear quadratic regulator (LQR)
摘要:
System stabilization via policy gradient (PG) methods has drawn increasing attention in both control and machine learning communities. In this article, we study their convergence and sample complexity for stabilizing linear time-invariant systems in terms of the number of system rollouts. Our analysis is built upon a discounted linear quadratic regulator (LQR) method which alternatively updates the policy and the discount factor of the LQR problem. First, we propose an explicit rule to adaptively adjust the discount factor by exploring the stability margin of a linear control policy. Then, we establish the sample complexity of PG methods for stabilization, which only adds a coefficient logarithmic in the spectral radius of the state matrix to that for solving the LQR problem with a prior stabilizing policy. Finally, we perform simulations to validate our theoretical findings and demonstrate the effectiveness of our method on a class of nonlinear systems.