Distributed Multiarmed Bandits
成果类型:
Article
署名作者:
Zhu, Jingxuan; Liu, Ji
署名单位:
State University of New York (SUNY) System; Stony Brook University; State University of New York (SUNY) System; Stony Brook University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2023.3247982
发表日期:
2023
页码:
3025-3040
关键词:
Robot sensing systems
Distributed algorithms
Program processors
Random variables
Manipulators
Social networking (online)
PROCESS CONTROL
Cooperative control
Machine Learning
Agents and autonomous systems
network analysis and control
摘要:
This article studies a distributed multiarmed bandit problem with heterogeneous observations of rewards. The problem is cooperatively solved by N agents assuming each agent faces a common set of M arms yet observes only local biased rewards of the arms. The goal of each agent is to minimize the cumulative expected regret with respect to the true rewards of the arms, where the mean of each arm's true reward equals the average of the means of all agents' observed biased rewards. Each agent recursively updates its decision by utilizing the information from its neighbors. Neighbor relationships are described by a time-dependent directed graph G(t) whose vertices correspond to agents and whose arcs depict neighbor relationships. A fully distributed bandit algorithm is proposed, which couples the classical distributed averaging algorithm and the celebrated upper confidence bound bandit algorithm. It is shown that for any uniformly strongly connected sequence of G(t), the algorithm achieves guaranteed regret for each agent at the order of O(log T).
来源URL: