-
作者:Khaleghei, Akram; Kim, Michael Jong
作者单位:University of Toronto; University of British Columbia
摘要:We formulate a maintenance control model as an optimal stopping problem under partial observations. The key challenge in our formulation is that the underlying state process is not restricted to be Markovian but rather is allowed to follow a semi-Markov process, which is more realistic in practice. Consequently, the stopping problem is not representable as a partially observable Markov decision process (POMDP) with finite state space, a commonly adopted modeling framework in the maintenance op...
-
作者:Wu, Zijun; Moehring, Rolf H.; Chen, Yanyan; Xu, Dachuan
作者单位:Hefei University; Technical University of Berlin; Beijing University of Technology; Beijing University of Technology
摘要:We investigate the price of anarchy (PoA) in nonatomic congestion games when the total demand T gets very large. First results in this direction have recently been obtained by Colini-Baldeschi et al. (2016, 2017, 2020) for routing games and show that the PoA converges to one when the growth of the total demand T satisfies certain regularity conditions. We extend their results by developing a new framework for the limit analysis of the PoA that offers strong techniques such as the limit of game...
-
作者:Cen, Shicong; Cheng, Chen; Chen, Yuxin; Wei, Yuting; Chi, Yuejie
作者单位:Carnegie Mellon University; Stanford University; Princeton University; University of Pennsylvania
摘要:Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization-an algorithmic scheme that encourages exploration-and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develop...
-
作者:Wang, Zhengli; Zenios, Stefanos
作者单位:University of Hong Kong; Stanford University
摘要:We model the creation of a new venture with a novel drift-variance diffusion control framework in which the state of the venture is captured by a diffusion process. The entrepreneur creating the venture chooses costly controls, which determine both the drift and the variance of the process. When the process reaches an upper boundary, the venture succeeds and the entrepreneur receives a reward. When the process reaches a lower boundary, the venture fails. The entrepreneur can choose between two...
-
作者:Balseiro, Santiago R.; Kim, Anthony; Russo, Daniel
作者单位:Columbia University; Amazon.com
摘要:We consider a principal who repeatedly interacts with a strategic agent holding private information. In each round, the agent observes an idiosyncratic shock drawn independently and identically from a distribution known to the agent but not to the principal. The utilities of the principal and the agent are determined by the values of the shock and outcomes that are chosen by the principal based on reports made by the agent. When the principal commits to a dynamic mechanism, the agent best-resp...
-
作者:Bertsimas, Dimitris; Ng, Yee Sian; Yan, Julia
作者单位:Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT)
摘要:Mass transit remains the most efficient way to service a densely packed commuter population. However, reliability issues and increasing competition in the transportation space have led to declining ridership across the United States, and transit agencies must also operate under tight budget constraints. Recent attempts at using bus network redesign to improve ridership have attracted attention from various transit authorities. However, the analysis seems to rely on ad hoc methods, for example,...
-
作者:Alptekinoglu, Aydin; Semple, John H.
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Southern Methodist University
摘要:We investigate analytical and empirical properties of the Heteroscedastic Exponomial Choice (HEC) model to lay the groundwork for its use in theoretical and empirical studies that build demand models on a discrete choice foundation. The HEC model generalizes the Exponomial Choice (EC) model by including choice-specific variances for the random components of utility (the error terms). We show that the HEC model inherits some of the properties found in the EC model: closed-form choice probabilit...
-
作者:Bhandari, Jalaj; Russo, Daniel; Singal, Raghav
作者单位:Columbia University; Columbia University
摘要:Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Excep...
-
作者:Blanchet, Jose; Kang, Yang
作者单位:Stanford University; Columbia University
摘要:We present a novel inference approach that we call sample out-of-sample inference. The approach can be used widely, ranging from semisupervised learning to stress testing, and it is fundamental in the application of data-driven distributionally robust optimization. Our method enables measuring the impact of plausible out-of-sample scenarios in a given performance measure of interest, such as a financial loss. The methodology is inspired by empirical likelihood (EL), but we optimize the empiric...
-
作者:Chen, Ningyuan; Chen, Ying-Ju
作者单位:University of Toronto; Hong Kong University of Science & Technology; Hong Kong University of Science & Technology
摘要:We consider two firms selling products to a market of network-connected customers. Each firm is selling one product, and the two products are substitutable. The customers make purchases based on the multinomial logit model, and the firms compete for their purchasing probabilities. We characterize possible Nash equilibria for homogeneous network interactions and identical firms: When the network effects are weak, there is a symmetric equilibrium that the two firms evenly split the market; when ...