An Optimal Computing Budget Allocation Tree Policy for Monte Carlo Tree Search

成果类型:
Article
署名作者:
Li, Yunchuan; Fu, Michael C.; Xu, Jie
署名单位:
University System of Maryland; University of Maryland College Park; University System of Maryland; University of Maryland College Park; University System of Maryland; University of Maryland College Park; George Mason University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2021.3088792
发表日期:
2022
页码:
2685-2699
关键词:
Search problems Heuristic algorithms games Monte Carlo methods Machine learning algorithms dynamic programming Space exploration Machine Learning Monte Carlo tree search (MCTS) optimization algorithms stochastic optimal control
摘要:
We analyze a tree search problem with an underlying Markov decision process, in which the goal is to identify the best action at the root that achieves the highest cumulative reward. We present a new tree policy that optimally allocates a limited computing budget to maximize a lower bound on the probability of correctly selecting the best action at each node. Compared to widely used upper confidence bound (UCB) tree policies, the new tree policy presents a more balanced approach to manage the exploration and exploitation tradeoff when the sampling budget is limited. Furthermore, UCB assumes that the support of reward distribution is known, whereas our algorithm relaxes this assumption. Numerical experiments demonstrate the efficiency of our algorithm in selecting the best action at the root.