您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2014 > 4期

Partial Monitoring-Classification, Regret Bounds, and Algorithms

成果类型：

Article

署名作者：

Bartok, Gabor; Foster, Dean P.; Pal, David; Rakhlin, Alexander; Szepesvari, Csaba

署名单位：

Swiss Federal Institutes of Technology Domain; ETH Zurich; Yahoo! Inc; Alphabet Inc.; Google Incorporated; University of Pennsylvania; University of Alberta

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2014.0663

发表日期：

2014

页码：

967-997

关键词：

minimizing regret prediction

摘要：

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero or scales as T-1/2, T-2/3, or T up to constants and logarithmic factors. We provide computationally efficient learning algorithms that achieve the minimax regret within a logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.