-
作者:Dorobantu, Victor D.; Azizzadenesheli, Kamyar; Yue, Yisong
作者单位:California Institute of Technology; Purdue University System; Purdue University; Nvidia Corporation
摘要:We study policy optimization problems for deterministic Markov decision processes (MDPs) with metric state and action spaces, which we refer to as metric policy optimization problems (MPOPs). Our goal is to establish theoretical results on the well-posedness of MPOPs that can characterize practically relevant continuous control systems. To do so, we define a special class of MPOPs called compactly restrictable MPOPs (CR-MPOPs), which are flexible enough to capture the complex behavior of robot...
-
作者:Chawla, Ronshee; Sankararaman, Abishek; Shakkottai, Sanjay
作者单位:University of Texas System; University of Texas Austin; University of California System; University of California Berkeley; University of Texas System; University of Texas Austin
摘要:We study a multiagent stochastic linear bandit with side information, parameterized by an unknown vector 0(*) ? R-d. The side information consists of a finite collection of low-dimensional subspaces, one of which contains 0(*). In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other and each agent plays a projected varian...
-
作者:Furieri, Luca; Guo, Baiwei; Martin, Andrea; Ferrari-Trecate, Giancarlo
作者单位:Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne
摘要:As we transition toward the deployment of data-driven controllers for black-box cyberphysical systems, complying with hard safety constraints becomes a primary concern. Two key aspects should be addressed when input-output data are corrupted by noise: how much uncertainty can one tolerate without compromising safety, and to what extent is the control performance affected? By focusing on finite-horizon constrained linear- quadratic problems, we provide an answer to these questions in terms of t...
-
作者:Massiani, Pierre-Francois; Heim, Steve; Solowjow, Friedrich; Trimpe, Sebastian
作者单位:RWTH Aachen University; Max Planck Society; Massachusetts Institute of Technology (MIT)
摘要:Safety constraints and optimality are important but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning (RL) to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine the relationship of both safety and optimality to penalties, and formalize sufficient conditions for safe valu...
-
作者:Galimberti, Clara Lucia; Furieri, Luca; Xu, Liang; Ferrari-Trecate, Giancarlo
作者单位:Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne; Shanghai University
摘要:Deep neural networks (DNNs) training can be difficult due to vanishing and exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stem from the discretization of continuous-time Hamiltonian systems and include several existing DNN architectures based on ordinary differential equations. Our main result is that a broad set of H-DNNs ensures nonvanishing gradients by design for an arbitrary netw...
-
作者:Zehfroosh, Ashkan; Tanner, Herbert G.
作者单位:University of Delaware
摘要:This article presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. Using the idea of delayed Q-learning, this article extends the well-known Nash Q-learning algorithm to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC. Comparative numerical results dem...
-
作者:Jongeneel, Wouter; Sutter, Tobias; Kuhn, Daniel
作者单位:Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne; University of Konstanz
摘要:We propose a principled method for projecting an arbitrary square matrix to the nonconvex set of asymptotically stable matrices. Leveraging ideas from large deviations theory, we show that this projection is optimal in an information-theoretic sense and that it simply amounts to shifting the initial matrix by an optimal linear quadratic feedback gain, which can be computed exactly and highly efficiently by solving a standard linear quadratic regulator problem. The proposed approach allows us t...
-
作者:Greene, Max L.; Bell, Zachary I.; Nivison, Scott; Dixon, Warren E.
作者单位:Johns Hopkins University; State University System of Florida; University of Florida
摘要:The infinite horizon optimal tracking problem is solved for a deterministic, control-affine, unknown nonlinear dynamical system. A deep neural network (DNN) is updated in real time to approximate the unknown nonlinear system dynamics. The developed framework uses a multitimescale concurrent learning-based weight update policy, with which the output layer DNN weights are updated in real time, but the internal DNN features are updated discretely and at a slower timescale (i.e., with batch-like u...
-
作者:Scroccaro, Pedro Zattoni; Kolarijani, Arman Sharifi; Esfahani, Peyman Mohajerin
作者单位:Delft University of Technology
摘要:In the past few years, online convex optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this article, we propose new step-size rules and OCO algorithms that simultaneously exploit gradient predictions, function predictions and dynamics, features particularly pertinent to control applications. The proposed algorithms enjoy static and dynamic regret bounds in terms of the dynamics of the refe...
-
作者:Goel, Gautam; Hassibi, Babak
作者单位:University of California System; University of California Berkeley; California Institute of Technology
摘要:In this article, we consider estimation and control in linear dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing causal state estimators and causal controllers, which compete against a clairvoyant noncausal policy, instead of the best policy selected in hindsight from some fixed parametric class. We show that regret-optimal filters and regret-optimal controllers can be derived in state space form using operat...