Reddit post claims evidence of prompt injection in Anthropic's Claude
Researchers allegedly found a 100% success rate exploiting Claude's system prompt...
Deep Dive
A Reddit post was submitted by user johnnyApplePRNG.
Key Points
- User johnnyApplePRNG claims 100% success rate in prompt injection against Claude 3.5 Sonnet and Opus
- Technique reportedly extracts hidden system prompts and bypasses safety guardrails
- If confirmed, undermines Anthropic's Constitutional AI approach to safety alignment
Why It Matters
Prompt injection can make Claude unsafe for enterprise deployment; trust in constitutional AI questioned.