Three team strategies tested?

Voting, Silent Team (captain sees answers), and Talkative Team (captain sees answers + rationales).

Best team achieved 44.23% accuracy on 572 ChGK questions, up to 20 percentage points over single-model baselines?

Best team achieved 44.23% accuracy on 572 ChGK questions, up to 20 percentage points over single-model baselines.

Explanatory communication between models reduced accuracy drops caused by inter-model disagreement?

Explanatory communication between models reduced accuracy drops caused by inter-model disagreement.

Research & Papers

LLM Teams Boost Quiz Accuracy by 20 Points in New Study

arXiv cs.CL June 01, 2026

⚡Teamwork makes the dream work for LLMs, boosting quiz accuracy by up to 20%.

Deep Dive

A new paper by Kotelnikova et al. (accepted at Dialogue-2026) explores whether LLM teams outperform single models on complex reasoning tasks. Using six recent open-source LLMs, they created teams to answer questions from the Russian quiz game 'What? Where? When?' (ChGK), which demands indirect reasoning and cultural knowledge. The team designed three interaction strategies: Voting (majority rule), Silent Team (captain sees final answers only), and Talkative Team (captain sees both answers and rationales). On a dataset of 572 2025 questions, team-based approaches consistently beat single models, with gains of up to 20 percentage points in accuracy. The best team achieved 44.23% accuracy, nearing human teams on questions with available statistics. Interestingly, the study found that inter-model disagreement strongly predicted lower accuracy, but explanatory communication (the Talkative strategy) substantially mitigated those performance drops.

Further analysis of captain behavior revealed no self-preference bias—captains did not favor their own initial answers over peers' rationales. Access to peer reasoning improved captain judgments, suggesting that LLM teams act primarily as answer selection and error-filtering mechanisms rather than generating novel solutions. The authors argue that adaptive strategies—where interaction style changes based on task difficulty—represent a promising direction for multi-agent systems. This research highlights that collaboration and communication, not just scale or parameter count, can significantly boost LLM reasoning capabilities in complex, knowledge-intensive domains.

Key Points

Three team strategies tested: Voting, Silent Team (captain sees answers), and Talkative Team (captain sees answers + rationales).
Best team achieved 44.23% accuracy on 572 ChGK questions, up to 20 percentage points over single-model baselines.
Explanatory communication between models reduced accuracy drops caused by inter-model disagreement.

Why It Matters

Demonstrates that multi-agent collaboration with communication can drastically improve LLM reasoning on complex tasks.

Read Original Article

LLM Teams Boost Quiz Accuracy by 20 Points in New Study

Why It Matters

Related Articles

🚀 Stay Ahead in AI