SMAC-Talk benchmark tests LLM agents on StarCraft with deceptive ally
New open-source benchmark adds natural language to StarCraft multi-agent coordination, including betrayal scenarios.
SMAC-Talk is a new open-source benchmark from researchers Joel Sol and Homayoun Najjaran, built on the StarCraft Multi-Agent Challenge (SMAC). It introduces a natural language communication channel that allows LLM-based agents to coordinate in cooperative multi-agent settings with partial observability and long-horizon decision-making. The key innovation is the ability to probe agent trust and coordination through language, including scenarios where one agent is a deceptive communicator that tries to disrupt its allies purely via text messages. The benchmark uses four models from the Qwen3.5 family to study how reasoning structure, memory capacity, and model scale affect coordination performance.
SMAC-Talk provides three pre-built agents for benchmarking, enabling systematic evaluation of LLM coordination capabilities. The environment's partial observability forces agents to share information through natural language, mirroring real-world multi-agent systems where models must communicate and decide under uncertainty. By including a deceptive agent, the benchmark tests whether LLMs can detect and respond to misinformation—a crucial ability for real-world deployments. The researchers hope SMAC-Talk will help the community develop more robust, trustworthy LLM agents for cooperative tasks like robotics swarms, automated trading, or collaborative coding.
- SMAC-Talk extends StarCraft Multi-Agent Challenge with a natural language communication channel for LLM agents
- Includes a deceptive agent scenario to test trust and coordination under misinformation
- Benchmarks four Qwen3.5 model sizes (from the 3.5 family) across reasoning, memory, and scale
Why It Matters
Real-world AI teams need to handle betrayal and misinformation—SMAC-Talk gives researchers a concrete testbed for trust.