-
作者:Cai, Qi; Yang, Zhuoran; Lee, Jason D.; Wang, Zhaoran
作者单位:Northwestern University; Yale University; Princeton University
摘要:Temporal difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, because of the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bell...
-
作者:Guo, Lei; Ye, Jane J.; Zhang, Jin
作者单位:East China University of Science & Technology; University of Victoria; Southern University of Science & Technology; Peng Cheng Laboratory
摘要:In this paper, we perform a sensitivity analysis for the maximal value function, which is the optimal value function for a parametric maximization problem. Our aim is to study various subdifferentials for the maximal value function. We obtain upper estimates of Fre ' chet, limiting, and horizon subdifferentials of the maximal value function by using some sensitivity analysis techniques sophisticatedly. The derived upper estimates depend only on the union of all solutions and not on its convex ...
-
作者:Cesa-Bianchi, Nicolo; Cesari, Tommaso; Colomboni, Roberto; Fusco, Federico; Leonardi, Stefano
作者单位:University of Milan; University of Ottawa; Istituto Italiano di Tecnologia - IIT; Sapienza University Rome
摘要:Bilateral trade, a fundamental topic in economics, models the problem of intermediating between two strategic agents, a seller and a buyer, willing to trade a good for which they hold private valuations. In this paper, we cast the bilateral trade problem in a regret minimization framework over T rounds of seller/buyer interactions, with no prior knowledge on their private valuations. Our main contribution is a complete characterization of the regret regimes for fixed-price mechanisms with diff...
-
作者:Bednarz, Ewelina; Ernst, Philip A.; OseRkowskia, Adam
作者单位:University of Warsaw; Imperial College London
摘要:We consider the Brownian spider process, also known as Walsh Brownian motion, first introduced by J. B. Walsh [Walsh JB (1978) A diffusion with a discontinuous local time. Asterisque 52:37-45]. The paper provides the best constant Cn for the inequality root ffiffiffiffififfi ED tau <= Cn E tau ,where tau is the class of all adapted and integrable stopping times and D denotes the diameter of the spider process measured in terms of the British rail metric. This solves a variant of the long-stand...
-
作者:Lamperski, Jourdain; Freund, Robert M.; Todd, Michael J.
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; Massachusetts Institute of Technology (MIT); Cornell University
摘要:The ellipsoid algorithm is a fundamental algorithm for computing a solution to the system of m linear inequalities in n variables (P) : A(T)x <= u when its set of solutions has positive volume. However, when (P) is infeasible, the ellipsoid algorithm has no mechanism for proving that (P) is infeasible. This is in contrast to the other two fundamental algorithms for tackling (P), namely, the simplex and interior-point methods, each of which can be easily implemented in a way that either produce...
-
作者:Chen, Xi; Ma, Will; Simchi-Levi, David; Xin, Linwei
作者单位:New York University; Columbia University; Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); University of Chicago
摘要:In this paper, we consider a personalized assortment planning problem under inventory constraints, where each arriving customer type is defined by a primary item of interest. As long as that item is in stock, the customer adds it to the shopping cart, at which point the retailer can recommend to the customer an assortment of add-ons to go along with the primary item. This problem is motivated by the new recommendation at checkout systems that have been deployed at many online retailers, and it...
-
作者:Nagarajan, Viswanath; Wang, Lily
作者单位:University of Michigan System; University of Michigan
摘要:We consider a general online network design problem in which a sequence of N requests arrive over time, each of which needs to use some subset of the available resources resource. The objective is to minimize the total cost P E. The cost incurred by a resource e E E is some function fe of the total load le on that eEE fe(le). We focus on cost functions that exhibit (dis)economies of scale. These functions are of the form fe(x) = sigma e + xi e center dot x alpha e if x > 0 (and zero if x = 0),...