Self-Anchored Consensus protocol protects LLM agents from Byzantine attacks
New decentralized protocol prevents rogue AI agents from poisoning swarm intelligence
Large language model (LLM) agents increasingly collaborate over peer-to-peer networks to improve reliability, but this interaction also introduces vulnerabilities: a single malicious or faulty (Byzantine) agent can sway an entire swarm toward incorrect conclusions. Existing defenses rely on centralized leaders or self-reported confidence, both of which are easily compromised. A new paper from researchers introduces Self-Anchored Consensus (SAC), a fully decentralized protocol that tackles this challenge head-on. In SAC, each agent iteratively exchanges responses with neighbors, locally evaluates incoming messages using an anchoring mechanism, filters out unreliable information, and refines its own output. The framework provides formal (F+1)-robustness conditions for the underlying communication graph, proving that honest agents can preserve and propagate reliable information as long as each honest node has at least F+1 honest neighbors.
Extensive experiments on mathematical reasoning (GSM8K) and commonsense reasoning (CSQA) benchmarks demonstrate SAC's effectiveness. Under attack scenarios with up to 30% Byzantine agents, SAC maintains near-baseline accuracy while prior methods like majority voting and confidence-based filtering degrade sharply. The protocol also adapts to various graph topologies (e.g., ring, scale-free, random) without reconfiguration. By eliminating the need for a central coordinator or trust assumptions, SAC represents a significant advance toward robust, decentralized AI collaboration. The work has immediate implications for safety-critical multi-agent systems in robotics, finance, and distributed decision-making.
- SAC requires no central coordinator, removing single points of failure and leader-based vulnerabilities
- Achieves (F+1)-robustness: each honest agent needs only F+1 honest neighbors to maintain correctness against up to F Byzantine agents
- Experiments show SAC preserves 95%+ accuracy on GSM8K and CSQA under 30% Byzantine attacks, while prior methods drop below 60%
Why It Matters
Enables safe, decentralized deployment of LLM swarms in real-world applications without trusting any single agent