您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > IEEE Transactions on Automatic Control > 2021 > 11期

Risk-Averse Allocation Indices for Multiarmed Bandit Problem

成果类型：

Article

署名作者：

Malekipirbazari, Milad; Cavus, Ozlem

署名单位：

Ihsan Dogramaci Bilkent University

刊物名称：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

ISSN/ISSBN：

0018-9286

DOI：

10.1109/TAC.2021.3053539

发表日期：

2021

页码：

5522-5529

关键词：

Markov processes indexes resource management Heuristic algorithms dynamic scheduling Routing Random variables Coherent risk measures dynamic allocation index dynamic risk-aversion Gittins index multiarmed bandit (MAB)

摘要：

In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.