您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2022 > 12期

Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

成果类型：

Article

署名作者：

Li, Yingying; Tang, Yujie; Zhang, Runyu; Li, Na

署名单位：

Harvard University

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2021.3128592

发表日期：

2022

页码：

6429-6444

关键词：

Distributed reinforcement learning (RL) linear quadratic regulator (LQR) zero-order optimization

摘要：

This article considers a distributed reinforcement learning problem for decentralized linear quadratic (LQ) control with partial state observations and local costs. We propose a zero-order distributed policy optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization, and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Last, we numerically test ZODPO on multizone HVAC systems.