Risk-Averse Allocation Indices for Multiarmed Bandit Problem
成果类型:
Article
署名作者:
Malekipirbazari, Milad; Cavus, Ozlem
署名单位:
Ihsan Dogramaci Bilkent University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2021.3053539
发表日期:
2021
页码:
5522-5529
关键词:
Markov processes
indexes
resource management
Heuristic algorithms
dynamic scheduling
Routing
Random variables
Coherent risk measures
dynamic allocation index
dynamic risk-aversion
Gittins index
multiarmed bandit (MAB)
摘要:
In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.