Maximal Objectives in the Multiarmed Bandit with Applications
成果类型:
Article
署名作者:
Ozbay, Eren; Kamble, Vijay
署名单位:
University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital
刊物名称:
MANAGEMENT SCIENCE
ISSN/ISSBN:
0025-1909
DOI:
10.1287/mnsc.2022.00801
发表日期:
2024
关键词:
multiarmed bandits
L' objective
Online platforms
摘要:
In several applications of the stochastic multiarmed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, we study a new objective in the classic setup. Given K arms, instead of maximizing the expected total reward from T pulls (the traditional sum objective), we consider the vector of total rewards earned from each of the K arms at the end of T pulls and aim to maximize the expected highest total reward across arms (the max objective). For this objective, we show that any policy must incur an instance -dependent asymptotic regret of ohm(log T) (with a higher instance -dependent constant compared with the traditional objective) and a worst case regret of ohm(K1=3T2=3). We then design an adaptive explore -thencommit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and simultaneously achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top m arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared with several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of conditioning an adequate supply of value -providing market entities (workers/sellers/service providers) in online platforms and marketplaces.