您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2019 > 5期

Bandits with Global Convex Constraints and Objective

成果类型：

Article

署名作者：

Agrawal, Shipra; Devanur, Nikhil R.

署名单位：

Columbia University; Microsoft

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2019.1840

发表日期：

2019

页码：

1486-1502

关键词：

DECISION-MAKING RISK

摘要：

We consider a very general model for managing the exploration-exploitation trade-off, which allows global convex constraints and concave objective on the aggregate decisions over time in addition to the customary limitation on the time horizon. This model provides a natural framework to study many sequential decision-making problems with long-term convex constraints and concave utility and subsumes the classic multiarmed bandit (MAB) model and the bandits with knapsacks problem as special cases. We demonstrate that a natural extension of the upper confidence bound family of algorithms for MAB provides a polynomial time algorithm with near-optimal regret guarantees for this substantially more general model. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well-studied problems/algorithms, such as the Blackwell approachability problem, online convex optimization, and the Frank-Wolfe technique for convex optimization. We give several concrete examples of applications, particularly in risk-sensitive revenue management under unknown demand distributions, in which this more general bandit model of sequential decision making allows for richer formulations and more efficient solutions of the problem.

来源URL：

访问原文