您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2011 > 2期

The Irrevocable Multiarmed Bandit Problem

成果类型：

Article

署名作者：

Farias, Vivek F.; Madan, Ritesh

署名单位：

Massachusetts Institute of Technology (MIT); Qualcomm

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.1100.0891

发表日期：

2011

页码：

383-399

关键词：

efficient allocation rules restless bandits multiple plays rewards

摘要：

This paper considers the multiarmed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective, and we refer to this problem as the irrevocable multiarmed bandit problem. We observe that natural modifications to well-known heuristics for multiarmed bandit problems that satisfy this irrevocability constraint have unsatisfactory performance and, thus motivated, introduce a new heuristic: the packing heuristic. We establish through numerical experiments that the packing heuristic offers excellent performance, even relative to heuristics that are not constrained to be irrevocable. We also provide a theoretical analysis that studies the price of irrevocability, i.e., the performance loss incurred in imposing the constraint we propose on the multiarmed bandit model. We show that this performance loss is uniformly bounded for a general class of multiarmed bandit problems and indicate its dependence on various problem parameters. Finally, we obtain a computationally fast algorithm to implement the packing heuristic; the algorithm renders the packing heuristic computationally cheaper than methods that rely on the computation of Gittins indices.

来源URL：

访问原文