-
作者:Audibert, Jean-Yves; Bubeck, Sebastien; Lugosi, Gabor
作者单位:Centre National de la Recherche Scientifique (CNRS); Universite PSL; Ecole Normale Superieure (ENS); Inria; Princeton University; ICREA; Pompeu Fabra University
摘要:We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the minimal loss she would have achieved by picking, in hindsight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full informati...
-
作者:Bartok, Gabor; Foster, Dean P.; Pal, David; Rakhlin, Alexander; Szepesvari, Csaba
作者单位:Swiss Federal Institutes of Technology Domain; ETH Zurich; Yahoo! Inc; Alphabet Inc.; Google Incorporated; University of Pennsylvania; University of Alberta
摘要:In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with...
-
作者:Baeuerle, Nicole; Rieder, Ulrich
作者单位:Helmholtz Association; Karlsruhe Institute of Technology; Ulm University
摘要:We investigate the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite horizon that is generated by a Markov decision process (MDP). In contrast to a risk-neutral decision maker this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by an ordinary MDP with e...