-
作者:Qu, Guannan; Wierman, Adam; Li, Na
作者单位:Carnegie Mellon University; California Institute of Technology; Harvard University
摘要:We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a scalable actor critic (SAC) framework that exploits the networ...
-
作者:Wu, Chenguang (Allen); Bassamboo, Achal; Perry, Ohad
作者单位:Hong Kong University of Science & Technology; Northwestern University; Northwestern University
摘要:As empirically observed in restaurants, call centers, and intensive care units, service times needed by customers are often related to the delay they experience in queue. Two forms of dependence mechanisms in service systems with customer abandonment immediately come to mind: First, the service requirement of a customer may evolve while waiting in queue, in which case the service time of each customer is endogenously determined by the system's dynamics. Second, customers may arrive (exogenousl...
-
作者:Farias, Vivek F.; Gutin, Eli
作者单位:Massachusetts Institute of Technology (MIT); Uber Technologies, Inc.
摘要:Recent years have seen a resurgence of interest in Bayesian algorithms for the multiarmed bandit (MAB) problem, such as Thompson sampling. These algorithms seek to exploit prior information on arm biases. The empirically observed performance of these algorithms makes them a compelling alternative to their frequentist counterparts. Nonetheless, there appears to be a wide range in empirical performance among such Bayesian algorithms. These algorithms also vary substantially in their design (as o...
-
作者:Keppo, Jussi; Kim, Michael Jong; Zhang, Xinyuan
作者单位:National University of Singapore; National University of Singapore; University of British Columbia
摘要:We study optimal manipulation of a Bayesian learner through adaptive provisioning of information. The problem is motivated by settings in which a firm can disseminate possibly biased information at a cost, to influence the public's belief about a hidden parameter related to the firm's payoffs. For example, firms advertise to sell products. We study a sequential optimizationmodel in which the firmdynamically decides on the quantity and content of information sent to the public, aiming to maximi...
-
作者:Wang, Yining; Wang, He
作者单位:State University System of Florida; University of Florida; University System of Georgia; Georgia Institute of Technology
摘要:Price-based revenue management is an important problem in operations management with many practical applications. The problemconsiders a sellerwho sells one ormultiple products over T consecutive periods and is subject to constraints on the initial inventory levels of resources. Whereas, in theory, the optimal pricing policy could be obtained via dynamic programming, computing the exact dynamic programming solution is often intractable. Approximate policies, such as the resolving heuristics, a...
-
作者:Bakshi, Gurdip; Crosby, John; Gao, Xiaohui
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Temple University; Old Dominion University
摘要:Emphasizing the statistics of jumps crossing the strike and local time, we develop a decomposition of equity option risk premiums. Operationalizing this theoretical treatment, we equip the pricing kernel process with unspanned risks, embed (unspanned) jump risks, and allow equity return volatility to contain unspanned risks. Unspanned risks are consistent with negative risk premiums for jumps crossing the strike and local time and imply negative risk premiums for out-of-the-money call options ...
-
作者:Bruce, Norris I.; Krishnamoorthy, Anand; Prasad, Ashutosh
作者单位:University of North Carolina; University of North Carolina Chapel Hill; State University System of Florida; University of Central Florida; University of California System; University of California Riverside
摘要:This paper uses dynamic optimization to study the optimal advertising of fash-ion products over time. For fashion products, brand advertising and exclusivity are impor-tant sales drivers. Therefore, we propose a dynamic model of the sales of multiple styles of a fashion brand based on these variables. The model is estimated using a particle filter method on data from two fashion categories (handbags and sunglasses) and has good fit and prediction. We also derive explicit analytical solutions o...
-
作者:Hu, Yichun; Kallus, Nathan; Mao, Xiaojie
作者单位:Cornell University; Tsinghua University
摘要:We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Holder class with smoothness parameter beta. We showhowthis interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (beta at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite beta), with which rate-optimal regret can be achieved with...
-
作者:Park, Chiwoo; Do Noh, Sang; Srivastava, Anuj
作者单位:State University System of Florida; Florida State University; Sungkyunkwan University (SKKU); State University System of Florida; Florida State University
摘要:The analysis of motion and time has become significant in operations research, especially for analyzing work performance in manufacturing and service operations in the development of lean manufacturing and smart factory. This paper develops a framework for data-driven analysis of work motions and studies their correlations to work speeds or execution rates, using data collected from modern motion sensors. Past efforts primarily relied on manual steps involving time-consuming stop-watching, vid...
-
作者:Kallus, Nathan; Uehara, Masatoshi
作者单位:Cornell University
摘要:Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian and time-invariant structure in efficient OPE. We first derive the efficiency bounds and efficient influence functions for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible ...