Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
成果类型:
Article
署名作者:
Yang, YH; Zhu, D
署名单位:
Iowa State University; Iowa State University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
发表日期:
2002
页码:
100-121
关键词:
regression
摘要:
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms, It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.