Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits
成果类型:
Article
署名作者:
Hayashi, Naoki
署名单位:
University of Osaka
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3426379
发表日期:
2025
页码:
293-306
关键词:
Bayes methods
optimization
Multi-agent systems
Stochastic processes
information exchange
scalability
Power system stability
Distributed Thompson sampling
multiagent system
stochastic bandit problem
摘要:
This article considers a distributed Thompson sampling algorithm for a cooperative multiplayer multiarmed bandit problem. We consider a multiagent system in which each agent pulls an arm according to consensus-based Bayesian inference with probability matching. To estimate the reward probability of each arm, a group of agents shares the observed rewards with neighboring agents in a communication graph. Following the information exchange, each agent updates the estimation of the posterior distributions based on the observed reward and the received information from the neighboring agents. Then, each agent decides which arm to select at the next iteration based on the estimated posterior distribution. We demonstrate that the expected regret for the multiagent system with the proposed distributed Thompson sampling algorithm is logarithmic with iteration. Numerical examples show that agents can effectively estimate the optimal arm by cooperatively learning the reward distribution of a set of arms.