30 dense debating LLM agents produce no more answer diversity than 1 agent on MMLU-Hard (hard-ceiling regime)?

30 dense debating LLM agents produce no more answer diversity than 1 agent on MMLU-Hard (hard-ceiling regime).

Debate gains in homogeneous teams come from re-evaluation, not peer content—a noise placebo matches self-correction at 4× scale?

Debate gains in homogeneous teams come from re-evaluation, not peer content—a noise placebo matches self-correction at 4× scale.

Only heterogeneous (architecturally diverse) teams lower the c parameter and escape the hard-ceiling regime; communication-mode changes don't help?

Only heterogeneous (architecturally diverse) teams lower the c parameter and escape the hard-ceiling regime; communication-mode changes don't help.

Agent Frameworks

Ringelmann Effect in Multi-Agent LLMs: 30 Agents = 1 Agent on MMLU-Hard

arXiv cs.MA June 03, 2026

⚡Adding more LLM agents doesn't help—new scaling law shows hard ceilings at ~5 agents.

Deep Dive

A new paper from researchers Blaž Bertalanič and Carolina Fortuna applies the classic Ringelmann Effect—individual effort decreases as team size grows—to multi-agent LLM systems. They derive a two-parameter scaling law R(N) = N_eff/N = 1/(1+c(N-1)N^{-β}) that classifies configurations into three asymptotic regimes: hard-ceiling (β=0) where effective agents top out at 1/c, sublinear (0<β<1) with growth ~N^β/c, and linear (β≥1) for unbounded gains. The law was tested across 44 model×task×condition cells spanning peer debate, self-correction, random-noise placebo, self-consistency, open-weight families (Qwen, Llama, Ministral from 7B to 32B), Gemini API, thinking models, heterogeneous teams, and sparse communication—all fitting with R² > 0.99.

Three practical findings stand out. First, on MMLU-Hard, thirty dense debating agents produce no more answer diversity than a single agent—a hard-ceiling regime. Second, a noise placebo tracks self-correction on free-form math at 4× scale, meaning gains commonly attributed to “debate” actually come from re-evaluation, not peer content. Third, a single pilot run with N≤5 agents predicts the N=30 structural ceiling, and only architectural diversity (heterogeneous teams) lowers the c parameter to escape the hard-ceiling regime—communication-mode interventions did not help. The paper provides a practical tool: before scaling up agent teams, run a quick pilot with a few agents to see if you’re in a hard-ceiling regime.

Key Points

30 dense debating LLM agents produce no more answer diversity than 1 agent on MMLU-Hard (hard-ceiling regime).
Debate gains in homogeneous teams come from re-evaluation, not peer content—a noise placebo matches self-correction at 4× scale.
Only heterogeneous (architecturally diverse) teams lower the c parameter and escape the hard-ceiling regime; communication-mode changes don't help.

Why It Matters

This scaling law gives AI teams a quick pilot test to avoid wasting compute on massive agent swarms with diminishing returns.

Read Original Article

Ringelmann Effect in Multi-Agent LLMs: 30 Agents = 1 Agent on MMLU-Hard

Why It Matters

Related Articles

🚀 Stay Ahead in AI