AI Safety

Communicate-Predict-Act: Evaluating Social Intelligence of Agents

A new study pits LLMs against each other in social games, revealing what skills truly matter for AI agents.

Deep Dive

A team of researchers has published a new paper, "Communicate-Predict-Act: Evaluating Social Intelligence of Agents," introducing a rigorous framework to test how AI agents handle complex social situations. The researchers created a multiplayer arena of mixed cooperative and competitive games, where they evaluated eight diverse large language models (LLMs) ranging from 24 billion to 1 trillion parameters. Using their Communicate-Predict-Act (COMPACT) protocol, they moved beyond simple Elo ratings to extract fine-grained sociocognitive metrics that capture an agent's ability to predict actions, exert communicative influence, reason strategically, and navigate trade-offs.

The study's key finding challenges conventional wisdom: metrics like influence, transparency, and adaptability were more predictive of an agent's success in social games than deeper cognitive skills like Theory of Mind inference or long-term planning. The sociocognitive metrics showed strong consistency within models and reliably predicted which agent would win a game with an AUC ROC score of 0.82. This work provides a testable, multidimensional conception of social intelligence, offering crucial empirical insights for developers building LLM agents destined for real-world social settings where persuasion and cooperation are key.

Key Points
  • Tested 8 LLMs (24B to 1T params) in a novel arena of social games using the COMPACT protocol.
  • Found influence and adaptability were more predictive of success than Theory of Mind (AUC ROC = 0.82).
  • Provides a multidimensional framework for evaluating AI social intelligence beyond scalar performance scores.

Why It Matters

This framework is essential for developing AI agents that can effectively collaborate, negotiate, and operate in human social environments.