STOCHASTIC CONTINUUM-ARMED BANDITS WITH ADDITIVE MODELS: MINIMAX REGRETS AND ADAPTIVE ALGORITHM
成果类型:
Article
署名作者:
Cai, T. Tony; Pu, Hongming
署名单位:
University of Pennsylvania
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/22-AOS2182
发表日期:
2022
页码:
2179-2204
关键词:
dimensional linear-regression
Optimal Rates
confidence-intervals
optimization
摘要:
We consider d-dimensional stochastic continuum-armed bandits with the expected reward function being additive beta-Holder with sparsity s for 0 < beta < infinity and 1 <= s <= d. The rate of convergence (O) over tilde (s center dot T beta+1/2 beta+1) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s. A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T. The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.