Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control

成果类型:
Article
署名作者:
Kordabad, Arash Bahari; Zanon, Mario; Gros, Sebastien
署名单位:
Norwegian University of Science & Technology (NTNU); IMT School for Advanced Studies Lucca
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2023.3277309
发表日期:
2024
页码:
1149-1156
关键词:
costs Markov processes trajectory mathematical models Stability criteria predictive models computational modeling Markov decision process (MDP) model predictive control (MPC) optimality reinforcement learning (RL)
摘要:
This article shows that the optimal policy and value functions of a Markov decision process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted optimal control problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a model predictive control (MPC) scheme where a deterministic (possibly nonlinear) model is used to reduce the computational complexity. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, reinforcement learning algorithms can then be used to tune the parameterized MPC scheme. We verify the developed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.
来源URL: