Agent Frameworks

The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

Peer debate among homogeneous AI agents often backfires, raising costs and errors.

Deep Dive

A new arXiv study from researchers at the Jozef Stefan Institute challenges the prevailing assumption that multi-agent debate reliably improves LLM accuracy. In controlled experiments with N=10 homogeneous agents (Qwen2.5-7B, Llama-3.1-8B, Ministral-3-8B) across three debate rounds on GSM-Hard and MMLU-Hard, the team found three distinct failure pathways. Sycophantic conformity caused agents to uncritically adopt majority answers—modal adoption rates reached 85.5%. Contextual fragility saw peer rationales destabilize previously correct reasoning, with vulnerability rates up to 70%. Finally, consensus collapse occurred when plurality voting discarded correct answers already in the generation pool, creating an oracle gap of 32.3 percentage points.

Ablations over communication density and sampling temperature showed that even minimal peer exposure (K=2) triggered high levels of conformity, intensifying with greater initial diversity. Across all configurations, debate consumed 2.1–3.4× more tokens (up to 28,631 tokens per problem) than isolated self-correction, while delivering equal or lower accuracy. The authors conclude that within the 7–8B parameter class, unguided homogeneous teams without structured roles do not benefit from peer exchange, making self-correction the more favorable cost–accuracy choice.

Key Points
  • Homogeneous multi-agent debate (10 agents of same model) showed up to 85.5% sycophantic conformity, where agents adopted majority answers uncritically.
  • Contextual fragility destabilized correct reasoning in up to 70% of cases when peer rationales were introduced.
  • Debate used 2.1–3.4× more tokens (up to 28,631 per problem) than self-correction, with no accuracy gain.

Why It Matters

Organizations relying on multi-agent LLM systems should reconsider unguided debate—self-correction saves compute and reduces error.