您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2018 > 6期

Online Network Revenue Management Using Thompson Sampling

成果类型：

Article

署名作者：

Ferreira, Kris Johnson; Simchi-Levi, David; Wang, He

署名单位：

Harvard University; Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); University System of Georgia; Georgia Institute of Technology

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2018.1755

发表日期：

2018

页码：

1586-1602

关键词：

ASYMPTOTIC-BEHAVIOR demand algorithm policies

摘要：

We consider a price-based network revenue management problem in which a retailer aims to maximize revenue from multiple products with limited inventory over a finite selling season. As is common in practice, we assume the demand function contains unknown parameters that must be learned from sales data. In the presence of these unknown demand parameters, the retailer faces a trade-off commonly referred to as the exploration-exploitation trade-off. Toward the beginning of the selling season, the retailer may offer several different prices to try to learn demand at each price (exploration objective). Over time, the retailer can use this knowledge to set a price that maximizes revenue throughout the remainder of the selling season (exploitation objective). We propose a class of dynamic pricing algorithms that builds on the simple, yet powerful, machine learning technique known as Thompson sampling to address the challenge of balancing the exploration-exploitation trade-off under the presence of inventory constraints. Our algorithms have both strong theoretical performance guarantees and promising numerical performance results when compared with other algorithms developed for similar settings. Moreover, we show how our algorithms can be extended for use in general multiarmed bandit problems with resource constraints as well as in applications in other revenue management settings and beyond.