Regret Analysis of Learning-Based MPC With Partially Unknown Cost Function

成果类型:
Article
署名作者:
Dogan, Ilgin; Shen, Zuo-Jun Max; Aswani, Anil
署名单位:
University of California System; University of California Berkeley; University of Hong Kong; University of Hong Kong
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2023.3328827
发表日期:
2024
页码:
3246-3253
关键词:
Cost function COSTS Linear systems HVAC control systems Adaptation models ventilation learning-based control model-predictive control (MPC) nonmyopic exploitation restless bandits
摘要:
The exploration-exploitation tradeoff is an inherent challenge in data-driven adaptive control. Though this tradeoff has been studied for multiarmed bandits (MABs) and reinforcement learning for linear systems, it is less well studied for learning-based control of nonlinear systems. A significant theoretical challenge in the nonlinear setting is that there is no explicit characterization of an optimal controller for a given set of cost and system parameters. We propose the use of a finite-horizon oracle controller with full knowledge of parameters as a reasonable surrogate to an optimal controller. This allows us to develop policies in the context of learning-based model-predictive control (MPC) and conduct a control-theoretic analysis using techniques from MPC and optimization theory to show that these policies achieve low regret with respect to this finite-horizon oracle. Our simulations exhibit the low regret of our policy on a heating, ventilation, and air-conditioning model with partially unknown cost function.
来源URL: