ConventionPlay: Capability-Limited Training for Robust Ad-Hoc Collaboration
New RL method teaches agents to probe partner capabilities and steer teams toward optimal strategies.
A team of researchers including Abhishek Sriraman, Eleni Vasilaki, and Robert Loftin has introduced ConventionPlay, a novel reinforcement learning framework designed to solve a fundamental problem in multiagent AI collaboration. The system addresses scenarios where AI agents must work together without pre-established protocols, requiring them to identify shared conventions while actively steering toward optimal joint strategies. ConventionPlay extends traditional cognitive hierarchy models by incorporating a diverse population of adaptive followers with varying capability limits, forcing the training agent to develop sophisticated probing behaviors.
Unlike previous approaches that focused primarily on adaptation, ConventionPlay agents learn to actively assess their partner's repertoire of possible conventions. This allows them to make strategic decisions about when to lead the team toward more effective strategies and when to follow established patterns. The researchers tested their approach on canonical coordination tasks and demonstrated that ConventionPlay achieves superior coordination efficiency, particularly in settings where different conventions offer differentiated payoffs. This represents a significant advancement toward creating AI systems that can fluidly collaborate with unfamiliar partners in complex, real-world environments.
The technical innovation lies in ConventionPlay's training methodology, where agents are exposed to partners with deliberately limited capabilities. This creates a more realistic and challenging learning environment that mirrors real-world collaboration scenarios where partners may have different skill levels or knowledge bases. By learning to probe and respond to these limitations, ConventionPlay agents develop more robust collaboration strategies that can generalize to new partners and situations. The approach shows particular promise for applications requiring flexible teamwork between heterogeneous AI systems or between humans and AI assistants.
- Extends cognitive hierarchies with diverse populations of adaptive followers for more realistic training
- Agents learn to actively probe partner capabilities rather than simply adapting to existing conventions
- Demonstrates superior coordination efficiency in tasks with differentiated payoff structures
Why It Matters
Enables more robust AI collaboration in real-world scenarios where agents must work with unfamiliar partners and dynamically choose optimal strategies.