您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 运营管理 > Operations Research > 2022 > 6期

6期

Operations Research publishes quality operations research and management science works of interest to the OR practitioner and researcher in three substantive categories: methods, data-based operational science, and the practice of OR.

主办单位: INFORMS

期刊语言: 英语

创刊时间: 1952年

出版周期: 双月刊

国际电子刊号: 1526-5463

影响因子: 2.6

最新文章

Nonasymptotic Analysis of Monte Carlo Tree Search

作者：Shah, Devavrat; Xie, Qiaomin; Xu, Zhi

作者单位：Massachusetts Institute of Technology (MIT); University of Wisconsin System; University of Wisconsin Madison

摘要：In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo tree search (MCTS), in the context of the infinite-horizon discounted cost Markov decision process (MDP). Although MCTS is believed to provide an approximate value function for a given state with enough simulations, the claimed proof of this property is incomplete. This is because the variant of MCTS, the upper confidence bound for trees (UCT), analyzed in prior work...
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

作者：Kallus, Nathan; Uehara, Masatoshi

作者单位：Cornell University

摘要：Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian and time-invariant structure in efficient OPE. We first derive the efficiency bounds and efficient influence functions for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible ...
Scalable Reinforcement Learning for Multiagent Networked Systems

作者：Qu, Guannan; Wierman, Adam; Li, Na

作者单位：Carnegie Mellon University; California Institute of Technology; Harvard University

摘要：We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a scalable actor critic (SAC) framework that exploits the networ...
Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes

作者：Hu, Yichun; Kallus, Nathan; Mao, Xiaojie

作者单位：Cornell University; Tsinghua University

摘要：We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Holder class with smoothness parameter beta. We showhowthis interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (beta at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite beta), with which rate-optimal regret can be achieved with...
Learning Manipulation Through Information Dissemination

作者：Keppo, Jussi; Kim, Michael Jong; Zhang, Xinyuan

作者单位：National University of Singapore; National University of Singapore; University of British Columbia

摘要：We study optimal manipulation of a Bayesian learner through adaptive provisioning of information. The problem is motivated by settings in which a firm can disseminate possibly biased information at a cost, to influence the public's belief about a hidden parameter related to the firm's payoffs. For example, firms advertise to sell products. We study a sequential optimizationmodel in which the firmdynamically decides on the quantity and content of information sent to the public, aiming to maximi...
Plausible Screening Using Functional Properties for Simulations with Large Solution Spaces

作者：Eckman, David J.; Plumlee, Matthew; Nelson, Barry L.

作者单位：Texas A&M University System; Texas A&M University College Station; Northwestern University

摘要：When working with models that allow for many candidate solutions, simulation practitioners can benefit fromscreening out unacceptable solutions in a statistically controlled way. However, for large solution spaces, estimating the performance of all solutions through simulation can prove impractical. We propose a statistical framework for screening solutions even when only a relatively small subset of them is simulated. Our framework derives its superiority over exhaustive screening approaches ...
ALSO-X and ALSO-X1: Better Convex Approximations for Chance Constrained Programs

作者：Jiang, Nan; Xie, Weijun

作者单位：Virginia Polytechnic Institute & State University

摘要：In a chance constrained program (CCP), decision makers seek the best decision whose probability of violating the uncertainty constraints is within the prespecified risk level. As a CCP is often nonconvex and is difficult to solve to optimality, much effort has been devoted to developing convex inner approximations for a CCP, among which the conditional value-at-risk (CVaR) has been known to be the best for more than a decade. This paper studies and generalizes the ALSO-X, originally proposed b...
Constant Regret Resolving Heuristics for Price-Based Revenue Management

作者：Wang, Yining; Wang, He

作者单位：State University System of Florida; University of Florida; University System of Georgia; Georgia Institute of Technology

摘要：Price-based revenue management is an important problem in operations management with many practical applications. The problemconsiders a sellerwho sells one ormultiple products over T consecutive periods and is subject to constraints on the initial inventory levels of resources. Whereas, in theory, the optimal pricing policy could be obtained via dynamic programming, computing the exact dynamic programming solution is often intractable. Approximate policies, such as the resolving heuristics, a...
Smoothness-Adaptive Contextual Bandits

作者：Gur, Yonatan; Momeni, Ahmadreza; Wager, Stefan

作者单位：Stanford University; Stanford University

摘要：We study a nonparametric multiarmed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known in advance, and misspecification of smoothness may severely deteriorate the performance of existing m...
Asymptotically Optimal Control of a Centralized Dynamic Matching Market with General Utilities

作者：Blanchet, Jose H.; Reiman, Martin I.; Shah, Virag; Wein, Lawrence M.; Wu, Linjia

作者单位：Stanford University; Columbia University; Stanford University

摘要：We consider a matching market where buyers and sellers arrive according to independent Poisson processes at the same rate and independently abandon the market if not matched after an exponential amount of time with the same mean. In this centralized market, the utility for the system manager from matching any buyer and any seller is a general random variable. We consider a sequence of systems indexed by n where the arrivals in the nth system are sped up by a factor of n. We analyze two familie...