Agent Frameworks

CASPIAN detects cascade attacks in LLM multi-agent systems with sub-1% overhead

New framework spots adversarial influence spreading across AI agents in real time.

Deep Dive

Cascade attacks in LLM multi-agent systems occur when adversarial prompts or behaviors spread from one agent to another, causing escalating failures that can look benign locally but unfold rapidly across the system. Existing defenses are text-centric and fail to capture the cross-channel, temporally coordinated nature of such attacks. Enter CASPIAN: a novel framework that models multi-agent interactions using a unified dynamic causal influence matrix, estimated via a late-interaction conditional transfer entropy (LI-CTE) formulation. This allows CASPIAN to detect cascade onset from emergent system-level structure rather than isolated anomalies.

CASPIAN goes beyond detection by performing online causal attribution, identifying the origin agent, bridge agents (which propagate the attack), and amplifier agents (which magnify the impact). It reconstructs the principal propagation pathways, a capability no prior method supported. Across diverse multi-agent frameworks and benchmarks, CASPIAN consistently outperforms semantic guardrails, LLM-based judges, and graph-based anomaly detectors in both detection accuracy and early identification, all while operating with less than 1% relative latency overhead. The paper is available on arXiv (2605.19240) and the code is linked on GitHub.

Key Points
  • CASPIAN uses late-interaction conditional transfer entropy (LI-CTE) to estimate cross-channel causal influence in LLM multi-agent systems.
  • It detects cascade onset from emergent system-level structure, not isolated anomalies, and operates with sub-1% relative latency overhead.
  • Outperforms semantic guardrails, LLM-based judges, and graph-based detectors in accuracy and early identification across diverse benchmarks.

Why It Matters

As multi-agent AI systems proliferate, CASPIAN enables real-time defense against cascading failures, safeguarding enterprise deployments.