AI Safety

When Alignment Becomes an Attack Surface: Prompt Injection in Cooperative Multi-Agent Systems

LessWrong AI March 23, 2026

⚡A novel proposal tests if AI agents' cooperative norms can be weaponized by malicious prompt viruses.

Deep Dive

A new research proposal, originally submitted for the CAI Research Fellowship, highlights a critical security tension in AI multi-agent systems (MAS). It aims to merge findings from two pivotal 2024 papers: the 'Cooperate or Collapse' study, which used the GovSim platform to show LLM agents can learn to cooperate on shared resource dilemmas, and the 'Prompt Infection' paper, which demonstrated how malicious prompts can self-replicate like a virus between LLM agents. The core experiment would modify GovSim to introduce agents programmed with Prompt Infection (PI) techniques, whose goal is to hijack resource transfers within the cooperative network.

This research tackles a fundamental trade-off. The 'Cooperate or Collapse' team found that open communication between agents improved cooperation and prevented resource collapse. Conversely, the 'Prompt Infection' authors found that more open communication accelerated the spread of malicious prompts. The proposal seeks to discover if a well-aligned, cooperative MAS is inherently vulnerable to a malicious MAS built to weaponize its own norms—turning alignment into an attack surface. Key variables to test include whether agents using 'Universalization Reasoning' are more resilient and how network size and task difficulty affect susceptibility. The stakes are high for future deployments of MAS in critical infrastructure, where such an exploit could pose systemic risks.

Key Points

Proposal merges GovSim resource-management agents with 'Prompt Infection' self-replicating malicious prompts.
Seeks to test if cooperative communication norms in AI networks create a vulnerability for prompt injection attacks.
Highlights a direct tension: open communication aids cooperation but also facilitates the spread of malicious instructions.

Why It Matters

As AI agent networks move toward real-world deployment, understanding how to secure their cooperative foundations is a major safety priority.

Read Original Article

When Alignment Becomes an Attack Surface: Prompt Injection in Cooperative Multi-Agent Systems

Why It Matters

Stay Ahead in AI