您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Management Science > 2022 > 3期

Hedging the Drift: Learning to Optimize Under Nonstationarity

成果类型：

Article

署名作者：

Cheung, Wang Chi; Simchi-Levi, David; Zhu, Ruihao

署名单位：

National University of Singapore; Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); Purdue University System; Purdue University

刊物名称：

MANAGEMENT SCIENCE

ISSN/ISSBN：

0025-1909

DOI：

10.1287/mnsc.2021.4024

发表日期：

2022

页码：

1696-1713

关键词：

Data-driven decision-making non-stationary bandit optimization parameter-tree algorithms

摘要：

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the forgetting principle in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms.