Multiplayer Bandits: A Trekking Approach

成果类型:
Article
署名作者:
Hanawal, Manjesh Kumar; Darak, Sumit J.
署名单位:
Indian Institute of Technology System (IIT System); Indian Institute of Technology (IIT) - Bombay; Indraprastha Institute of Information Technology Delhi
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2021.3077454
发表日期:
2022
页码:
2237-2252
关键词:
Heuristic algorithms games sensors Synchronization information exchange batteries Base stations Distributed Learning multiplayer bandits static and dynamic players
摘要:
In this article, we study stochastic multiarmed bandits with many players. The players do not know the number of players, cannot communicate with each other, and if multiple players select a common arm, they collide and none of them receive any reward. We consider the static scenario, where the number of players remains fixed, and the dynamic scenario, where the players enter and leave at any time. We provide algorithms based on a novel trekking approach that guarantees constant regret for the static case and sublinear regret for the dynamic case with high probability. The trekking approach eliminates the need to estimate the number of players resulting in fewer collisions and improved regret performance compared to state-of-the-art algorithms. We also develop an epoch-less algorithm that eliminates any requirement of time synchronization across the players provided each player can detect the presence of other players on an arm. We validate our theoretical guarantees using simulation-based and real testbed-based experiments.