Hackers now exploit AI personalities with psychological tricks, not code
From 'ignore all instructions' to gaslighting: jailbreaking has gone social.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Deep Dive
AI jailbreaking has evolved from simple command
Key Points
- Early jailbreaks like 'DAN' and the 'grandma exploit' used simple roleplay to bypass safety rules
- Modern attacks use psychological manipulation (cajoling, flattery, gaslighting) instead of technical exploits
- Mindgard researchers 'gaslit' Claude to produce prohibited material, showing the shift to social engineering
Why It Matters
AI security now demands social intuition, not just code—a paradigm shift for how we defend conversational models.