Media & Culture

Hackers now exploit AI personalities with psychological tricks, not code

From 'ignore all instructions' to gaslighting: jailbreaking has gone social.

Deep Dive

AI jailbreaking has evolved from simple command

Key Points
  • Early jailbreaks like 'DAN' and the 'grandma exploit' used simple roleplay to bypass safety rules
  • Modern attacks use psychological manipulation (cajoling, flattery, gaslighting) instead of technical exploits
  • Mindgard researchers 'gaslit' Claude to produce prohibited material, showing the shift to social engineering

Why It Matters

AI security now demands social intuition, not just code—a paradigm shift for how we defend conversational models.