AI agents learn cooperation with emergent reputation via COOPER method
New multi-agent RL system jointly discovers reputation norms and cooperative strategies from scratch.
A new paper on arXiv introduces COOPER (COOPeration with Emergent Reputation), a distributed multi-agent reinforcement learning framework that tackles the long-standing challenge of promoting cooperation in social dilemmas through reputation. Previous approaches either assumed predefined reputation assessment rules or modeled reputation as an intrinsic reward, limiting generalization and adaptation. COOPER instead learns both the rules for assessing peer reputation and the cooperative policies simultaneously, using only the environment's reward signal. To handle the deep entanglement between reputation and policy—which introduces latency and noise in feedback—the researchers deliberately designed the constituent modules and data flows within COOPER.
Experiments on the donation game and the coin game in grid-world environments demonstrate that COOPER adapts effectively to various existing reputation systems and co-players. Notably, in self-play settings, the system exhibits the co-emergence of reputation norms and cooperative behavior without any human-designed heuristics. These results hold robustly across different social network topologies, underlining the generalizability and efficacy of the approach. The work has implications for decentralized AI systems where agents have limited perception and cognitive capabilities, such as autonomous vehicle coordination, distributed robotics, and peer-to-peer networks.
- COOPER jointly learns reputation assessment rules and reputation-based policies solely from environment rewards, removing the need for predefined rules.
- Tested on donation and coin games, the method adapts to various existing reputation systems and co-players in grid-world environments.
- Self-play leads to co-emergence of reputation norms and cooperation, robust across diverse social network topologies.
Why It Matters
Enables truly adaptive cooperation in decentralized AI systems, from robotics to autonomous networks, without human-designed reputation rules.