Early jailbreaks like 'DAN' and the 'grandma exploit' used simple roleplay to bypass safety rules?

Early jailbreaks like 'DAN' and the 'grandma exploit' used simple roleplay to bypass safety rules

Modern attacks use psychological manipulation (cajoling, flattery, gaslighting) instead of technical exploits?

Modern attacks use psychological manipulation (cajoling, flattery, gaslighting) instead of technical exploits

Mindgard researchers 'gaslit' Claude to produce prohibited material, showing the shift to social engineering

Media & Culture

The Verge AI May 26, 2026

⚡From 'ignore all instructions' to gaslighting: jailbreaking has gone social.

Deep Dive

AI jailbreaking has evolved from simple command

Key Points

Early jailbreaks like 'DAN' and the 'grandma exploit' used simple roleplay to bypass safety rules
Modern attacks use psychological manipulation (cajoling, flattery, gaslighting) instead of technical exploits
Mindgard researchers 'gaslit' Claude to produce prohibited material, showing the shift to social engineering

AI security now demands social intuition, not just code—a paradigm shift for how we defend conversational models.