您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2024 > 1期

Parallel and Flexible Dynamic Programming via the Mini-Batch Bellman Operator

成果类型：

Article

署名作者：

Gargiani, Matilde; Martinelli, Andrea; Martinez, Max Ruts; Lygeros, John

署名单位：

Swiss Federal Institutes of Technology Domain; ETH Zurich; Swiss Federal Institutes of Technology Domain; ETH Zurich

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2023.3270060

发表日期：

2024

页码：

455-462

关键词：

convergence COSTS dynamic programming optimal control cost function STANDARDS PROCESS CONTROL algorithms Dynamic programming (DP) parallel programming

摘要：

The Bellman operator constitutes the foundation of dynamic programming (DP). An alternative is presented by the Gauss-Seidel operator, whose evaluation, differently from that of the Bellman operator where the states are all processed at once, updates one state at a time while incorporating into the computation the interim results. The provably better convergence rate of DP methods based on the Gauss-Seidel operator comes at the price of an inherent sequentiality, which prevents the exploitation of modern multicore systems. In this work, we propose a new operator for DP, namely, the mini-batch Bellman operator, which aims at realizing the tradeoff between the better convergence rate of the methods based on the Gauss-Seidel operator and the parallelization capability offered by the Bellman operator. After the introduction of the new operator, a theoretical analysis for validating its fundamental properties is conducted. Such properties allow one to successfully deploy the new operator in the main DP schemes, such as value iteration and modified policy iteration. We compare the convergence of the DP algorithm based on the new operator with its earlier counterparts, shedding light on the algorithmic advantages of the new formulation and the impact of the batch-size parameter on the convergence. Finally, an extensive numerical evaluation of the newly introduced operator is conducted. In accordance with the theoretical derivations, the numerical results show the competitive performance of the proposed operator and its superior flexibility, which allows one to adapt the efficiency of its iterations to different structures of MDPs and hardware setups.