How indirect prompt injection attacks on AI work - and 6 ways to shut them down
Cybercriminals are weaponizing web prompts to steal data and execute code via AI chatbots.
Indirect prompt injection attacks represent a growing security risk for AI systems powered by large language models (LLMs). Unlike direct attacks that require crafting a malicious prompt to the AI itself, indirect injections hide instructions in web content, email bodies, or database records. When an AI assistant or chatbot scans this content to perform a task, it automatically executes the hidden instructions—without any user interaction. This can lead to data exfiltration, remote code execution, or displaying scam links and misinformation to users.
Microsoft and Palo Alto Networks have documented real-world examples of these attacks in the wild. The OWASP Foundation now ranks both direct and indirect prompt injection as the top threat in its Top 10 for Large Language Model Applications. To defend against them, experts recommend six strategies: strictly limit the LLM's access to sensitive systems (least privilege), sanitize and validate all external inputs before they reach the model, use output filtering to block malicious content, implement human-in-the-loop approval for risky actions, monitor model behavior for anomalies, and keep models updated with the latest security patches.
- Indirect prompt injection attacks hide malicious instructions in web content or email that LLMs automatically read and act on.
- These attacks can lead to data exfiltration, remote code execution, and phishing without any user interaction.
- OWASP ranks prompt injection as the top security threat for LLM applications; six defenses include input sanitization, least privilege, and output monitoring.
Why It Matters
As AI integrates into browsers and apps, indirect injections pose a silent, automated threat to enterprise data and user trust.