Unilateral incentive alignment in two-agent stochastic games
成果类型:
Article
署名作者:
McAvoy, Alex; Sehwag, Udari Madhushani; Hilbe, Christian; Chatterjee, Krishnendu; Barfuss, Wolfram; Su, Qi; Leonard, Naomi Ehrich; Plotkin, Joshua B.
署名单位:
University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine; Stanford University; Max Planck Society; Institute of Science & Technology - Austria; University of Bonn; University of Bonn; Shanghai Jiao Tong University; Princeton University; University of Pennsylvania; University of Pennsylvania
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-13561
DOI:
10.1073/pnas.2319927121
发表日期:
2025-06-24
关键词:
zero-determinant strategies
autocratic strategies
EVOLUTION
摘要:
Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of zero-determinant strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent (enforcer) can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent's viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.