您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Management Science > 2024 > 12期

Maximal Objectives in the Multiarmed Bandit with Applications

成果类型：

Article

署名作者：

Ozbay, Eren; Kamble, Vijay

署名单位：

University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital

刊物名称：

MANAGEMENT SCIENCE

ISSN/ISSBN：

0025-1909

DOI：

10.1287/mnsc.2022.00801

发表日期：

2024

关键词：

multiarmed bandits L' objective Online platforms

摘要：

In several applications of the stochastic multiarmed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, we study a new objective in the classic setup. Given K arms, instead of maximizing the expected total reward from T pulls (the traditional sum objective), we consider the vector of total rewards earned from each of the K arms at the end of T pulls and aim to maximize the expected highest total reward across arms (the max objective). For this objective, we show that any policy must incur an instance -dependent asymptotic regret of ohm(log T) (with a higher instance -dependent constant compared with the traditional objective) and a worst case regret of ohm(K1=3T2=3). We then design an adaptive explore -thencommit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and simultaneously achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top m arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared with several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of conditioning an adequate supply of value -providing market entities (workers/sellers/service providers) in online platforms and marketplaces.