您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2025 > 1期

Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits

成果类型：

Article

署名作者：

Hayashi, Naoki

署名单位：

University of Osaka

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2024.3426379

发表日期：

2025

页码：

293-306

关键词：

Bayes methods optimization Multi-agent systems Stochastic processes information exchange scalability Power system stability Distributed Thompson sampling multiagent system stochastic bandit problem

摘要：

This article considers a distributed Thompson sampling algorithm for a cooperative multiplayer multiarmed bandit problem. We consider a multiagent system in which each agent pulls an arm according to consensus-based Bayesian inference with probability matching. To estimate the reward probability of each arm, a group of agents shares the observed rewards with neighboring agents in a communication graph. Following the information exchange, each agent updates the estimation of the posterior distributions based on the observed reward and the received information from the neighboring agents. Then, each agent decides which arm to select at the next iteration based on the estimated posterior distribution. We demonstrate that the expected regret for the multiagent system with the proposed distributed Thompson sampling algorithm is logarithmic with iteration. Numerical examples show that agents can effectively estimate the optimal arm by cooperatively learning the reward distribution of a set of arms.