Agent Frameworks

DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making

New model treats multi-agent coordination as a dialogue, beating baselines by 40%

Deep Dive

Researchers from an undisclosed institution have introduced the Decision Language Model (DLM), a novel approach that unifies offline multi-agent sequential decision making using large language models (LLMs). The key insight is to reframe multi-agent coordination as a dialogue-style sequence prediction problem under the centralized training with decentralized execution (CTDE) paradigm. DLM is trained in two stages: first, supervised fine-tuning on dialogue-style datasets that incorporate inter-agent context and execute actions from offline trajectories; second, group relative policy optimization (GRPO) to improve robustness against out-of-distribution actions via lightweight reward functions.

On multiple benchmarks, a single unified DLM outperforms strong offline MARL baselines and existing LLM-based conversational decision-making methods. Notably, the model demonstrates strong zero-shot generalization to unseen scenarios across tasks, suggesting it can adapt to new environments without additional training. The work, detailed in a 22-page paper with 11 figures, addresses a key limitation of traditional MARL approaches that rely on fixed observation formats and action spaces. By leveraging the flexible modeling interface of LLMs, DLM can naturally handle heterogeneous observations and actions, making it more scalable and reusable for real-world multi-agent systems like autonomous driving fleets, warehouse robotics, or coordinated drone swarms.

Key Points
  • DLM reframes multi-agent decision making as a dialogue-style sequence prediction under CTDE paradigm
  • Two-stage training: supervised fine-tuning + group relative policy optimization (GRPO) for robustness
  • Outperforms offline MARL baselines and LLM-based methods with strong zero-shot generalization

Why It Matters

DLM could enable scalable, reusable multi-agent policies for autonomous driving, robotics, and drone coordination.