FINITE STATE MULTI-ARMED BANDIT PROBLEMS: SENSITIVE-DISCOUNT, AVERAGE-REWARD AND AVERAGE-OVERTAKING OPTIMALITY
成果类型:
Article
署名作者:
Katehakis, Michael N.; Rothblum, Uriel G.
署名单位:
Rutgers University System; Rutgers University Newark; Rutgers University New Brunswick; Technion Israel Institute of Technology
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
发表日期:
1996
页码:
1024-1034
关键词:
摘要:
We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expansions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.