Research & Papers

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

arXiv cs.DC April 06, 2026

⚡New system supports 2.7x more concurrent agents than vLLM by eliminating redundant memory usage.

Deep Dive

A research team from Peking University and other institutions has introduced TokenDance, a novel system designed to overcome a critical bottleneck in multi-agent LLM applications. These applications, where multiple AI agents work together in synchronized rounds, suffer from massive redundancy. Each agent receives the same shared context from a central scheduler, causing identical Key-Value (KV) cache data to be stored repeatedly across all agents. TokenDance directly exploits this 'All-Gather' communication pattern to enable collective KV cache sharing across an entire round of agent execution in a single step.

TokenDance's core innovation is its 'Diff-Aware Storage' engine. Instead of storing complete, duplicate KV caches for each agent, the system maintains a single master copy of shared context blocks. For each agent, it then stores only the sparse differences (diffs) from this master, achieving an impressive 11-17x compression on representative workloads. Evaluations on benchmarks like GenerativeAgents and AgentSociety show concrete results: TokenDance supports up to 2.7x more concurrent agents than the popular vLLM framework (with prefix caching) under the same service-level objective, reduces per-agent KV cache storage by up to 17.5x, and speeds up the initial 'prefill' phase of processing by up to 1.9x compared to other caching methods.

Key Points

Enables collective KV cache sharing for multi-agent LLMs, paying the reuse cost once per round instead of per agent.
Uses 'Diff-Aware Storage' to encode caches as block-sparse diffs, achieving 11-17x compression on workloads.
Benchmarks show 2.7x more concurrent agents than vLLM, 17.5x less storage per agent, and 1.9x prefill speedup.

Why It Matters

This breakthrough makes complex, multi-agent AI applications far more scalable and cost-effective to deploy at scale.

Read Original Article

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Why It Matters

Stay Ahead in AI