您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 概率 > The Annals of Applied Probability > 2017 > 5期

REINFORCEMENT LEARNING FROM COMPARISONS: THREE ALTERNATIVES ARE ENOUGH, TWO ARE NOT

成果类型：

Article

署名作者：

Laslier, Benoit; Laslier, Jean-Francois

署名单位：

Universite Paris Cite; Paris School of Economics; Centre National de la Recherche Scientifique (CNRS); Universite PSL; Ecole Normale Superieure (ENS)

刊物名称：

ANNALS OF APPLIED PROBABILITY

ISSN/ISSBN：

1050-5164

DOI：

10.1214/16-AAP1271

发表日期：

2017

页码：

2907-2925

关键词：

optimal strategies tournament games set aggregation

摘要：

This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.