-
作者:Ding, Yuhao; Zhang, Junzi; Lee, Hyunin; Lavaei, Javad
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:Entropy regularization is an efficient technique for encouraging exploration and preventing a premature convergence of (vanilla) policy gradient (PG) methods in reinforcement learning (RL). However, the theoretical understanding of entropy-regularized RL algorithms has been limited. In this article, we revisit the classical entropy-regularized PG methods with the soft-max policy parametrization, whose convergence has so far only been established assuming access to exact gradient oracles. To go...
-
作者:Ma, Yuwen; Li, Xianwei; Li, Shaoyuan; Lin, Zongli
作者单位:Shanghai Jiao Tong University; Shanghai Jiao Tong University; University of Virginia
摘要:This article studies reduced-order dynamic consensus protocols for homogeneous linear multiagent systems using pure relative output information. By applying H infinity control theory, a separation principle like method with an additional small-gain constraint is proposed for designing an (n(x)-n(u))th order protocol, where n(x )and n(u )are the numbers of states and inputs of each agent, respectively. Existence conditions for the protocol are then systematically discussed from the graph, low-g...
-
作者:Miller, Jared; Sznaier, Mario
作者单位:University of Stuttgart; Northeastern University
摘要:Common tasks in system analysis and control include optimal control, peak estimation, reachable set estimation, and maximum control invariant set estimation. A standard method to solve these problems is to lift them into infinite-dimensional convex linear programs. However, finite-dimensional truncations of these problems suffer a curse of dimensionality with respect to the size of the state and input. In the case where the dynamical system is input-affine and the input is restricted to a conv...
-
作者:Bemporad, Alberto
作者单位:IMT School for Advanced Studies Lucca
摘要:In this article, we propose a very efficient numerical method based on the Limited-memory Broyden-Fletcher-Goldfarb-Shanno with Box constraints (L-BFGS-B) algorithm for identifying linear and nonlinear discrete-time state-space models, possibly under L-1 and group-Lasso regularization for reducing model complexity. For the identification of linear models, we show that, compared to classical methods, the approach often provides better results, is much more general in terms of the loss and regul...
-
作者:Collins, Brandon C.; Xu, Shouhuai; Brown, Philip N.
作者单位:University of Colorado System; University of Colorado at Colorado Springs
摘要:The theory of learning in games has extensively studied situations where agents respond dynamically to each other in a static environment by optimizing a fixed utility function. However, real-world environments evolve as a result of past agent choices. Unfortunately, the analysis techniques that enabled a rich characterization of the emergent behavior of games played in static environments fail to cope with games played in dynamic environments. To address this problem, we develop a general fra...
-
作者:Das, Ersin; Burdick, Joel W.
作者单位:California Institute of Technology
摘要:This article proposes a safety-critical control design approach for nonlinear control affine systems in the presence of matched and unmatched uncertainties. Our constructive framework couples control barrier function (CBF) theory with a new uncertainty estimator to ensure robust safety. We use the estimated uncertainty, along with a derived upper bound on the estimation error, for synthesizing CBFs and safety-critical controllers via a quadratic program-based feedback control law that rigorous...
-
作者:Xu, Yuchun; Zhang, Yanjun; Zhang, Ji-Feng
作者单位:Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS; Chinese Academy of Sciences; University of Chinese Academy of Sciences, CAS; Beijing Institute of Technology; Beijing Institute of Technology; Zhongyuan University of Technology
摘要:Dealing with the uncertain high-frequency gain matrix, denoted as K-p, is a fundamental problem in multivariable adaptive control systems. In this article, we propose a new solution for parameter estimation and adaptive control for a general class of multi-input-multi-output discrete-time linear time-invariant systems. The proposed scheme does not require any prior knowledge of the sign or bound information of K-p, and thus, significantly relaxes the design conditions in traditional multivaria...
-
作者:Jin, Yuqiang; Zhang, Wen-An; Lu, Xinyu; Chen, Bo; Yu, Li
作者单位:Zhejiang University of Technology
摘要:A set of particle filters on matrix Lie groups is presented for state estimation, where the particles live in Lie algebra. The dynamical equations in the Lie algebraic form and the log-linear property on the invariant error are introduced separately, to design two efficient particle time update schemes. All operations during the update are transferred to a vector space and at least only a single mean particle is propagated, the remaining particles are calculated by the error update on Lie alge...
-
作者:Tsaousoglou, Georgios
作者单位:Technical University of Denmark
摘要:Of increasing relevance to engineering systems are problems that include online resource allocation to agents that feature adaptation and learning capabilities. This article considers the case where a coordinator gets to design a resource allocation mechanism (i.e., a bidding-allocation-rewards protocol) to efficiently allocate a resource to selfish agents that try to gain access by learning to communicate strategically. Toward aligning the agents' incentives with the social objective, a criti...
-
作者:Wang, Bao; Zhu, Quanxin; Li, Subei
作者单位:Xuzhou University of Technology; Hunan Normal University
摘要:This article studies the stability analysis and stabilization problems for a class of discrete-time hidden Markov jump singular systems with partly known transition probabilities and emission probabilities. The novel sufficient conditions with original coefficient matrices are proposed to guarantee the regularity, causality, and stochastic stability of the considered systems. Based on such conditions, the linear matrix inequalities-based asynchronous controller design method for the resulting ...