-
作者:Chan, Timothy C. Y.; Fernandes, Craig; Puterman, Martin L.
作者单位:University of Toronto; University of British Columbia
摘要:To develop a novel approach for performance assessment, this paper considers the problem of computing value functions in professional American football. We provide a theoretical justification for using a dynamic programming approach to estimating value functions in sports by formulating the problem as a Markov chain for two asymmetric teams. We show that the Bellman equation has a unique solution equal to the bias of the underlying infinite horizon Markov reward process. This result provides, ...
-
作者:Tian, Feng; Sun, Peng; Duenyas, Izak
作者单位:University of Michigan System; University of Michigan; Duke University
摘要:A principal hires an agent to repair a machine when it is down and maintain it when it is up and earns a revenue flow when the machine is up. Both the up- and downtimes follow exponential distributions. If the agent exerts effort, the downtime is shortened, and uptime is prolonged. Effort, however, is costly to the agent and unobservable to the principal. We study optimal dynamic contracts that always induce the agent to exert effort while maximizing the principal's profits. We formulate the c...
-
作者:Bhandari, Jalaj; Russo, Daniel; Singal, Raghav
作者单位:Columbia University; Columbia University
摘要:Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Excep...
-
作者:Chen, Ningyuan; Gallego, Guillermo
作者单位:University of Toronto; Hong Kong University of Science & Technology
摘要:Personalized pricing analytics is becoming an essential tool in retailing. Upon observing the personalized information of each arriving customer, the firm needs to set a price accordingly based on the covariates, such as income, education background, and past purchasing history, to extract more revenue. For new entrants of the business, the lack of historical data may severely limit the power and profitability of personalized pricing. We propose a nonparametric pricing policy to simultaneously...
-
作者:Blanchet, Jose; Kang, Yang
作者单位:Stanford University; Columbia University
摘要:We present a novel inference approach that we call sample out-of-sample inference. The approach can be used widely, ranging from semisupervised learning to stress testing, and it is fundamental in the application of data-driven distributionally robust optimization. Our method enables measuring the impact of plausible out-of-sample scenarios in a given performance measure of interest, such as a financial loss. The methodology is inspired by empirical likelihood (EL), but we optimize the empiric...
-
作者:Hwang, Dawsen; Jaillet, Patrick; Manshadi, Vahideh
作者单位:Alphabet Inc.; Google Incorporated; Massachusetts Institute of Technology (MIT); Yale University
摘要:For online resource allocation problems, we propose a new demand arrival model where the sequence of arrivals contains both an adversarial component and a stochastic one. Our model requires no demand forecasting; however, because of the presence of the stochastic component, we can partially predict future demand as the sequence of arrivals unfolds. Under the proposed model, we study the problem of the online allocation of a single resource to two types of customers and design online algorithms...