Parallel and Flexible Dynamic Programming via the Mini-Batch Bellman Operator
成果类型:
Article
署名作者:
Gargiani, Matilde; Martinelli, Andrea; Martinez, Max Ruts; Lygeros, John
署名单位:
Swiss Federal Institutes of Technology Domain; ETH Zurich; Swiss Federal Institutes of Technology Domain; ETH Zurich
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2023.3270060
发表日期:
2024
页码:
455-462
关键词:
convergence
COSTS
dynamic programming
optimal control
cost function
STANDARDS
PROCESS CONTROL
algorithms
Dynamic programming (DP)
parallel programming
摘要:
The Bellman operator constitutes the foundation of dynamic programming (DP). An alternative is presented by the Gauss-Seidel operator, whose evaluation, differently from that of the Bellman operator where the states are all processed at once, updates one state at a time while incorporating into the computation the interim results. The provably better convergence rate of DP methods based on the Gauss-Seidel operator comes at the price of an inherent sequentiality, which prevents the exploitation of modern multicore systems. In this work, we propose a new operator for DP, namely, the mini-batch Bellman operator, which aims at realizing the tradeoff between the better convergence rate of the methods based on the Gauss-Seidel operator and the parallelization capability offered by the Bellman operator. After the introduction of the new operator, a theoretical analysis for validating its fundamental properties is conducted. Such properties allow one to successfully deploy the new operator in the main DP schemes, such as value iteration and modified policy iteration. We compare the convergence of the DP algorithm based on the new operator with its earlier counterparts, shedding light on the algorithmic advantages of the new formulation and the impact of the batch-size parameter on the convergence. Finally, an extensive numerical evaluation of the newly introduced operator is conducted. In accordance with the theoretical derivations, the numerical results show the competitive performance of the proposed operator and its superior flexibility, which allows one to adapt the efficiency of its iterations to different structures of MDPs and hardware setups.