Index-based policies for discounted multi-armed bandits on parallel machines
成果类型:
Article
署名作者:
Glazebrook, KD; Wilkinson, DJ
署名单位:
Newcastle University - UK
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
发表日期:
2000
页码:
877-896
关键词:
conservation-laws
scheduling jobs
flowtime
systems
摘要:
We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Nino-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an O(alpha) quantity of optimality, where alpha > 0 is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.