您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Statistics > 2024 > 1期

TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS

成果类型：

Article

署名作者：

Cai, Changxiao; Cai, T. Tony; Li, Hongzhe

署名单位：

University of Michigan System; University of Michigan; University of Pennsylvania; University of Pennsylvania

刊物名称：

ANNALS OF STATISTICS

ISSN/ISSBN：

0090-5364

DOI：

10.1214/23-AOS2341

发表日期：

2024

页码：

207-232

关键词：

minimax adaptive estimation randomized allocation confidence bands bounds adaptation CLASSIFICATION inference regret curve sets

摘要：

Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected from source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the source domains for learning in the target domain.

来源URL：

访问原文