Agent Frameworks

Conjunctive Prompt Attacks in Multi-Agent LLM Systems

New attack splits malicious prompts across multiple AI agents, evading detection by tools like Llama-Guard.

Deep Dive

A team of researchers has identified a novel security vulnerability in multi-agent AI systems, dubbed the 'conjunctive prompt attack.' Unlike traditional attacks on single models, this method exploits the routing logic between multiple interacting agents, like those in complex workflows. The attack works by splitting a malicious instruction into two seemingly harmless components: a trigger phrase placed in a user's initial query and a hidden adversarial template inserted into a single compromised agent. Neither part appears malicious when inspected in isolation by standard safety tools.

When the system's routing logic directs the user query containing the trigger to the compromised agent, the two components combine to form a complete, harmful prompt that executes. The researchers demonstrated that routing-aware optimization of this attack significantly increases its success rate across common system architectures (star, chain, and DAG) while keeping false activations low. Critically, they found that existing defenses—including specialized models like PromptGuard and Llama-Guard variants, as well as system-level restrictions on tools—fail to stop it because no single component raises a red flag.

This research, accepted at ACL 2026, exposes a fundamental structural weakness in agentic AI pipelines. It highlights that safety evaluations focused on single agents are insufficient for securing interconnected systems. The findings motivate the development of new defensive strategies that must reason holistically about cross-agent communication and the compositional nature of prompts as they flow through a network. The attack code has been made publicly available, underscoring the urgency for the AI development community to address this emerging threat vector.

Key Points
  • Attack splits harmful prompts into two benign parts that only activate when combined via system routing.
  • Bypasses leading defenses like PromptGuard and Llama-Guard because no single component appears malicious.
  • Demonstrated high success across star, chain, and DAG agent topologies with routing-aware optimization.

Why It Matters

Reveals a critical blind spot in securing complex AI agent workflows, forcing a rethink of safety for interconnected systems.