-
作者:Xie, Qiaomin; Chen, Yudong; Wang, Zhaoran; Yang, Zhuoran
作者单位:University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison; Northwestern University; Yale University
摘要:We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online settin...
-
作者:Boodaghians, Shant; Fusco, Federico; Lazos, Philip; Leonardi, Stefano
作者单位:University of Illinois System; University of Illinois Urbana-Champaign; Sapienza University Rome
摘要:The Pandora's Box problem, originally posed by Weitzman in 1979, models selection from a set of random-valued alternatives-the boxes-when evaluation is costly. Weitzman showed that the Pandora's Box problem admits a simple threshold-based solution that considers the options in decreasing order of reservation value, a proxy for the actual value of the boxes in the exploration process. We study for the first time this problem when the order in which the boxes are opened is constrained, forcing t...
-
作者:Rutten, Daan; Mukherjeea, Debankur
作者单位:University System of Georgia; Georgia Institute of Technology
摘要:Consider a system with N identical single-server queues and a number of task types, where each server is able to process only a small subset of possible task types. Arriving tasks select d >= 2 random compatible servers and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph between the servers and the task types. When the graph is complete bipartite, the mean-field approximation is accurate. However, such dense compatibility graphs are inf...
-
作者:Saunderson, James
作者单位:Monash University
摘要:Every convex homogeneous polynomial (or form) is nonnegative. Blekherman has shown that there exist convex forms that are not sums of squares via a nonconstructive argument. We provide an explicit example of a convex form of degree 4 in 272 variables that is not a sum of squares. The form is related to the Cauchy-Schwarz inequality over the octonions. The proof uses symmetry reduction together with the fact (due to Blekherman) that forms of even degree that are near-constant on the unit sphere...
-
作者:Bogomolnaia, Anna; Moulin, Herve
作者单位:University of Glasgow; Centre National de la Recherche Scientifique (CNRS)
摘要:When dividing a manna Omega of private items (commodities, workloads, land, time slots) between n agents, the individual guarantee is the welfare each agent can secure in the worst case of other agents' preferences and actions. If the manna is nonatomic and utilities are continuous (not necessarily monotone or convex) the minmax utility, that of our agent's best share in the agent's worst partition of the manna, is guaranteed by Kuhn's generalization of divide and choose. The larger maxmin uti...
-
作者:Goyal, Vineet; Grand-Clement, Julien
作者单位:Columbia University
摘要:We consider a robust approach to address uncertainty in model parameters in Markov decision processes (MDPs), which are widely used to model dynamic optimization in many applications. Most prior works consider the case in which the uncertainty on transitions related to different states is uncoupled and the adversary is allowed to select the worst possible realization for each state unrelated to others, potentially leading to highly conservative solutions. On the other hand, the case of general...
-
作者:Gur, Eyal; Sabach, Shoham; Shtern, Shimrit
作者单位:Technion Israel Institute of Technology
摘要:We introduce a new algorithmic framework for solving nonconvex optimization problems, that is called nested alternating minimization, which aims at combining the classical alternating minimization technique with inner iterations of any optimization method. We provide a global convergence analysis of the new algorithmic framework to critical points of the problem at hand, which to the best of our knowledge, is the first of this kind for nested methods in the nonconvex setting. Central to our gl...
-
作者:Fibich, Gadi; Golan, Amit
作者单位:Tel Aviv University
摘要:Does a new product spread faster among heterogeneous or homogeneous consumers? We analyze this question using the stochastic discrete Bass model in which consumers may differ in their individual external influence rates {p(j)} and in their individual internal influence rates {q(j)}. When the network is complete and the heterogeneity is only manifested in {p(j)} or only in {q(j)}, it always slows down the diffusion, compared with the corresponding homogeneous network. When, however, consumers a...
-
作者:Gutman, David H.; Ho-Nguyen, Nam
作者单位:Texas Tech University System; Texas Tech University; University of Sydney
摘要:We extend coordinate descent to manifold domains and provide convergence analyses for geodesically convex and nonconvex smooth objective functions. Our key insight is to draw an analogy between coordinate blocks in Euclidean space and tangent subspaces of a manifold. Hence, our method is called tangent subspace descent (TSD). The core principle behind ensuring convergence of TSD is the appropriate choice of subspace at each iteration. To this end, we propose two novel conditions, the (C, r)-no...
-
作者:Zhang, Liwei; Zhang, Yule; Xiao, Xiantao; Wu, Jia
作者单位:Dalian University of Technology; Dalian Maritime University
摘要:This paper considers the problem of minimizing a convex expectation function over a closed convex set, coupled with a set of inequality convex expectation constraints. We present a new stochastic approximation proximal method of multipliers to solve this convex stochastic optimization problem. We analyze regrets of the proposed method for solving convex stochastic optimization problems. Under mild conditions, we show that this method exhibits sublinear regret for both objective reduction and c...