Multiplayer Bandits Without Observing Collision Information
成果类型:
Article
署名作者:
Lugosi, Gabor; Mehrabian, Abbas
署名单位:
Pompeu Fabra University; ICREA; McGill University
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2021.1168
发表日期:
2022
页码:
1247-1265
关键词:
multiarmed bandit
摘要:
We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.