Strategies for prediction under imperfect monitoring

成果类型:
Article
署名作者:
Lugosi, Gabor; Mannor, Shie; Stoltz, Gilles
署名单位:
ICREA; Pompeu Fabra University; Pompeu Fabra University; McGill University; Universite PSL; Ecole Normale Superieure (ENS); Centre National de la Recherche Scientifique (CNRS); Centre National de la Recherche Scientifique (CNRS); Hautes Etudes Commerciales (HEC) Paris
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.1080.0312
发表日期:
2008
页码:
513-528
关键词:
universal prediction gradient regret
摘要:
We propose simple randomized strategies for sequential decision ( or prediction) under imperfect monitoring, that is, when the decision maker ( forecaster) does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best-possible average reward among all fixed actions. It was Rustichini [Rustichini, A. 1999. Minimizing regret: The general case. Games Econom. Behav. 29 224-243] who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback signals, these rates are optimal up to logarithmic terms.
来源URL: