Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits

成果类型:
Article
署名作者:
Hayashi, Naoki
署名单位:
University of Osaka
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3426379
发表日期:
2025
页码:
293-306
关键词:
Bayes methods optimization Multi-agent systems Stochastic processes information exchange scalability Power system stability Distributed Thompson sampling multiagent system stochastic bandit problem
摘要:
This article considers a distributed Thompson sampling algorithm for a cooperative multiplayer multiarmed bandit problem. We consider a multiagent system in which each agent pulls an arm according to consensus-based Bayesian inference with probability matching. To estimate the reward probability of each arm, a group of agents shares the observed rewards with neighboring agents in a communication graph. Following the information exchange, each agent updates the estimation of the posterior distributions based on the observed reward and the received information from the neighboring agents. Then, each agent decides which arm to select at the next iteration based on the estimated posterior distribution. We demonstrate that the expected regret for the multiagent system with the proposed distributed Thompson sampling algorithm is logarithmic with iteration. Numerical examples show that agents can effectively estimate the optimal arm by cooperatively learning the reward distribution of a set of arms.