Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
New transformer architecture achieves superior performance on StarCraft II and MuJoCo by eliminating action-order sensitivity.
A team of researchers has introduced a novel AI architecture called the Consensus Multi-Agent Transformer (CMAT), designed to overcome fundamental challenges in cooperative Multi-Agent Reinforcement Learning (MARL). Traditional MARL decomposes a central control problem into multiple agents, which often leads to unstable training, weak coordination, and non-stationarity. CMAT reframes this as a hierarchical single-agent problem, treating all agents as a unified entity. It processes the massive joint observation space with a Transformer encoder and tackles the complex joint action space with a unique hierarchical mechanism.
The core innovation is a Transformer decoder that autoregressively generates a high-level 'consensus vector' in a latent space, simulating how agents reach an agreement on strategy. Conditioned on this consensus, all agents then generate their actions simultaneously. This eliminates the sensitivity to action-generation order that plagues conventional Multi-Agent Transformers (MAT), enabling truly order-independent decision-making. This factorization allows the entire joint policy to be optimized using the stable, well-understood single-agent Proximal Policy Optimization (PPO) algorithm while preserving sophisticated coordination.
In rigorous testing, CMAT demonstrated superior performance against recent centralized solutions, sequential MARL methods, and conventional baselines across major benchmarks. These include the complex micromanagement tasks of StarCraft II, the continuous control challenges of Multi-Agent MuJoCo, and the strategic gameplay of Google Research Football. The model's ability to maintain expressive coordination through latent consensus, combined with the training stability of single-agent methods, marks a significant step toward more reliable and scalable multi-agent AI systems for real-world applications like robotics and autonomous systems.
- CMAT uses a Transformer decoder to generate a latent consensus vector, enabling simultaneous action generation for all agents and eliminating order-sensitivity.
- The framework allows the entire multi-agent policy to be trained with single-agent PPO, providing greater theoretical guarantees and training stability.
- Achieved state-of-the-art performance on StarCraft II, Multi-Agent MuJoCo, and Google Research Football, outperforming recent centralized and sequential MARL baselines.
Why It Matters
Provides a more stable and theoretically sound path to developing coordinated AI for robotics, autonomous vehicles, and complex game AI.