您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2022 > 6期

Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes

成果类型：

Article

署名作者：

Hu, Yichun; Kallus, Nathan; Mao, Xiaojie

署名单位：

Cornell University; Tsinghua University

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2021.2237

发表日期：

2022

页码：

3261-3281

关键词：

rates CONVERGENCE

摘要：

We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Holder class with smoothness parameter beta. We showhowthis interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (beta at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite beta), with which rate-optimal regret can be achieved with minimal or no exploration because of infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings, and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and nondifferentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.