Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

成果类型:
Article
署名作者:
Garivier, Aurelien; Menard, Pierre; Stoltz, Gilles
署名单位:
Universite de Toulouse; Universite Toulouse III - Paul Sabatier; Centre National de la Recherche Scientifique (CNRS); Hautes Etudes Commerciales (HEC) Paris; Centre National de la Recherche Scientifique (CNRS)
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2017.0928
发表日期:
2019
页码:
377-399
关键词:
Bounds complexity
摘要:
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
来源URL: