Partial Monitoring-Classification, Regret Bounds, and Algorithms
成果类型:
Article
署名作者:
Bartok, Gabor; Foster, Dean P.; Pal, David; Rakhlin, Alexander; Szepesvari, Csaba
署名单位:
Swiss Federal Institutes of Technology Domain; ETH Zurich; Yahoo! Inc; Alphabet Inc.; Google Incorporated; University of Pennsylvania; University of Alberta
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2014.0663
发表日期:
2014
页码:
967-997
关键词:
minimizing regret
prediction
摘要:
In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero or scales as T-1/2, T-2/3, or T up to constants and logarithmic factors. We provide computationally efficient learning algorithms that achieve the minimax regret within a logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.