Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes

成果类型:
Article
署名作者:
Hu, Yichun; Kallus, Nathan; Mao, Xiaojie
署名单位:
Cornell University; Tsinghua University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2021.2237
发表日期:
2022
页码:
3261-3281
关键词:
rates CONVERGENCE
摘要:
We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Holder class with smoothness parameter beta. We showhowthis interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (beta at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite beta), with which rate-optimal regret can be achieved with minimal or no exploration because of infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings, and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and nondifferentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.