Agent Frameworks

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

35 open-weight LLMs tested in economic games to predict collaborative science success.

Deep Dive

A new study by Kumar, Bharathwaj, and Jurgens introduces a framework to predict how well teams of LLMs will collaborate on scientific tasks. The researchers tested 35 open-weight models across six behavioral economics games designed to isolate different cooperation mechanisms (e.g., public goods, prisoner's dilemma, coordination games). They then deployed teams of these LLMs on multi-agent AI-for-Science workflows under shared GPU or credit budgets, measuring report accuracy, quality, and completion. Models that exhibited cooperative behavior in the games—especially those that invested in multiplicative team production rather than individual greedy strategies—consistently outperformed less cooperative models. Even after controlling for general ability (benchmark scores, model size, etc.), cooperative disposition remained a distinct, measurable predictor of team performance.

This work provides a practical diagnostic: instead of running expensive multi-agent simulations to evaluate potential LLM teammates, researchers can run a few fast economic games to generate a 'cooperative profile.' The approach is already validated on tasks like collaborative data analysis, model building, and scientific report generation. The authors argue this method can dramatically reduce the cost and risk of deploying LLM teams for scientific discovery, especially as multi-agent systems become more common. The full paper is available on arXiv (2604.20658).

Key Points
  • Benchmarked 35 open-weight LLMs across six behavioral economics games to measure cooperation traits.
  • Cooperative profiles predicted team performance on AI-for-Science tasks (accuracy, quality, completion) under shared budget constraints.
  • Effect held after controlling for general ability, showing cooperation as a distinct, testable LLM property.

Why It Matters

Enables cheap, fast screening of LLMs for collaborative science, reducing costly trial-and-error in multi-agent deployment.