您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2022 > 5期

Multiplayer Bandits: A Trekking Approach

成果类型：

Article

署名作者：

Hanawal, Manjesh Kumar; Darak, Sumit J.

署名单位：

Indian Institute of Technology System (IIT System); Indian Institute of Technology (IIT) - Bombay; Indraprastha Institute of Information Technology Delhi

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2021.3077454

发表日期：

2022

页码：

2237-2252

关键词：

Heuristic algorithms games sensors Synchronization information exchange batteries Base stations Distributed Learning multiplayer bandits static and dynamic players

摘要：

In this article, we study stochastic multiarmed bandits with many players. The players do not know the number of players, cannot communicate with each other, and if multiple players select a common arm, they collide and none of them receive any reward. We consider the static scenario, where the number of players remains fixed, and the dynamic scenario, where the players enter and leave at any time. We provide algorithms based on a novel trekking approach that guarantees constant regret for the static case and sublinear regret for the dynamic case with high probability. The trekking approach eliminates the need to estimate the number of players resulting in fewer collisions and improved regret performance compared to state-of-the-art algorithms. We also develop an epoch-less algorithm that eliminates any requirement of time synchronization across the players provided each player can detect the presence of other players on an arm. We validate our theoretical guarantees using simulation-based and real testbed-based experiments.