-
作者:Bartok, Gabor; Foster, Dean P.; Pal, David; Rakhlin, Alexander; Szepesvari, Csaba
作者单位:Swiss Federal Institutes of Technology Domain; ETH Zurich; Yahoo! Inc; Alphabet Inc.; Google Incorporated; University of Pennsylvania; University of Alberta
摘要:In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with...
-
作者:Baeuerle, Nicole; Rieder, Ulrich
作者单位:Helmholtz Association; Karlsruhe Institute of Technology; Ulm University
摘要:We investigate the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite horizon that is generated by a Markov decision process (MDP). In contrast to a risk-neutral decision maker this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by an ordinary MDP with e...
-
作者:Dunkel, Juliane; Schulz, Andreas S.
作者单位:Massachusetts Institute of Technology (MIT)
摘要:The question as to whether the Gomory-Chvatal closure of a nonrational polytope is a polytope has been a longstanding open problem in integer programming. In this paper, we answer this question in the affirmative by combining ideas from polyhedral theory and the geometry of numbers.
-
作者:Davis, Chad; Hare, Warren
作者单位:University of British Columbia
摘要:The normal cone to a constraint set plays a key role in optimization theory, algorithms, and applications. We consider the question of how to approximate the normal cone to a set under the assumption that the set is provided through an oracle function or collection of oracle functions, but contains some exploitable structure. We provide a new simplex gradient-based approximation technique that works for sets defined through a finite number of oracle-based functions. We further present novel re...
-
作者:Jaskiewicz, Anna; Matkowski, J.; Nowak, A. S.
作者单位:Wroclaw University of Science & Technology; University of Zielona Gora
摘要:In this paper we study a Markov decision process with a nonlinear discount function. First, we define a utility on the space of trajectories of the process in the finite and infinite time horizon and then take their expected values. It turns out that the associated optimization problem leads to a nonstationary dynamic programming and an infinite system of Bellman equations, which result in obtaining persistently optimal policies. Our theory is enriched by examples.
-
作者:Wiesemann, Wolfram; Kuhn, Daniel; Rustem, Berc
作者单位:Imperial College London
摘要:Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observa...
-
作者:Tomala, Tristan
作者单位:Hautes Etudes Commerciales (HEC) Paris
摘要:This paper considers a general model of repeated games with incomplete information and imperfect monitoring. We study belief-free communication equilibria (BFCE) defined as follows. Players communicate with. a mediator who receives types and signals and recommends actions. A BFCE is a communication device such that all players have an incentive to play faithfully, irrespectively of their belief about the state. We characterize BFCE payoffs for any repeated game with incomplete information in t...
-
作者:Drapeau, Samuel; Kupper, Michael
作者单位:Humboldt University of Berlin
摘要:To address the plurality of interpretations of the subjective notion of risk, we describe it by means of a risk order and concentrate on the context invariant features of diversification and monotonicity. Our main results are uniquely characterized robust representations of lower semicontinuous risk orders on vector spaces and convex sets. This representation covers most instruments related to risk and allows for a differentiated interpretation depending on the underlying context that is illus...
-
作者:Blanchet, Jose
作者单位:Columbia University
摘要:We consider the problems of computing overflow probabilities at level N in any subset of stations in a Jackson network and of simulating sample paths conditional on overflow. We construct algorithms that take O(N) function evaluations to estimate such overflow probabilities within a prescribed relative accuracy and to simulate paths conditional on overflow at level N. The algorithms that we present are optimal in the sense that the best possible performance that can be expected for conditional...
-
作者:Dieker, A. B.; Shin, J.
作者单位:University System of Georgia; Georgia Institute of Technology; International Business Machines (IBM); IBM USA
摘要:We construct a generic, simple, and efficient scheduling policy for stochastic processing networks, and provide a general framework to establish its stability. Our policy is randomized and prioritized: with high probability it prioritizes jobs that have been least routed through the network. We show that the network is globally stable under this policy if there exists an appropriate quadratic local Lyapunov function that provides a negative drift with respect to nominal loads at servers. Apply...