Chaotic Dynamics in Multi-LLM Deliberation
New study reveals AI agent committees produce unpredictable results, even when expected to be deterministic.
A new research paper titled "Chaotic Dynamics in Multi-LLM Deliberation" reveals that committees of large language models (LLMs) exhibit fundamentally unpredictable behavior, even in settings where practitioners expect deterministic outcomes. The study, authored by Hajime Shimao, Warut Khern-am-nuai, and Sung Joo Kim, modeled five-agent LLM committees as random dynamical systems and measured their instability using empirical Lyapunov exponents (λ̂) derived from trajectory divergence in committee preferences. Across 12 policy scenarios, the researchers identified two independent routes to chaos: assigning different roles (like Chair or Member) to identical LLMs, and mixing different LLM models (like GPT-4 and Claude 3) without roles. Critically, this instability appears even at temperature T=0, where LLM outputs are typically deterministic.
In the HL-01 benchmark, homogeneous committees with role assignments showed significant divergence (λ̂=0.0541), while heterogeneous committees without roles were even more unstable (λ̂=0.0947). Surprisingly, the combination of both factors (mixed models with roles) was less chaotic than heterogeneity alone, showing non-additive interaction effects. The researchers found that removing the Chair role reduced instability most effectively, and shortening the agents' memory windows further attenuated divergence. These findings challenge the assumption that multi-LLM systems for governance or decision-making will produce consistent results, highlighting the need for formal stability auditing in collective AI architectures.
- Multi-LLM committees show chaotic dynamics with Lyapunov exponents up to 0.0947, meaning small changes lead to vastly different outcomes
- Two primary instability sources: role differentiation in identical LLMs and model heterogeneity in no-role committees
- Instability persists even at temperature T=0, where single LLM outputs are deterministic
Why It Matters
AI governance systems using multi-agent deliberation may produce unreliable decisions, requiring new stability protocols for real-world deployment.