Why imitate, and if so, how? A boundedly rational approach to multi-armed bandits
成果类型:
Article
署名作者:
Schlag, KH
署名单位:
University of Bonn
刊物名称:
JOURNAL OF ECONOMIC THEORY
ISSN/ISSBN:
0022-0531
DOI:
10.1006/jeth.1997.2347
发表日期:
1998
页码:
130-156
关键词:
摘要:
Individuals in a finite population repeatedly choose among actions yielding uncertain payoffs. Between choices, each individual observes the action and realized outcome of one other individual. We restrict our search to learning rules with limited memory that increase expected payoffs regardless of the distribution underlying their realizations. It is shown that the rule that outperforms all others is that which imitates the action of an observed individual (whose realized outcome is better than self) with a probability proportional to the difference in these realizations. When each individual uses this best rule, the aggregate population behavior is approximated by the replicator dynamic. (C) 1998 Academic Press.