REINFORCEMENT LEARNING FROM COMPARISONS: THREE ALTERNATIVES ARE ENOUGH, TWO ARE NOT
成果类型:
Article
署名作者:
Laslier, Benoit; Laslier, Jean-Francois
署名单位:
Universite Paris Cite; Paris School of Economics; Centre National de la Recherche Scientifique (CNRS); Universite PSL; Ecole Normale Superieure (ENS)
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
DOI:
10.1214/16-AAP1271
发表日期:
2017
页码:
2907-2925
关键词:
optimal strategies
tournament games
set
aggregation
摘要:
This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.