The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

成果类型:
Article
署名作者:
Ryzhov, Ilya O.; Powell, Warren B.; Frazier, Peter I.
署名单位:
University System of Maryland; University of Maryland College Park; Princeton University; Cornell University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.1110.0999
发表日期:
2012
页码:
180-195
关键词:
stage sampling allocations bandits selection index
摘要:
We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.
来源URL: