PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
Automated adversarial testing now simulates diverse human perspectives, boosting attack success by 20%+
A new research paper introduces PersonaTeaming, a framework that integrates human-like personas into automated red-teaming for generative AI systems. Unlike traditional automated methods that generate adversarial prompts without considering the tester's background or perspective, PersonaTeaming creates personas — simulated user identities with specific traits, biases, and goals — to guide prompt generation. The core workflow systematically incorporates these personas into the adversarial prompt pipeline, exploring a broader range of attack strategies than prior methods like RainbowPlus. According to the paper, PersonaTeaming Workflow achieves higher attack success rates while preserving prompt diversity, a critical balance often hard to reach in automated safety testing.
To bridge the gap between automated persona simulation and real human judgment, the team also built PersonaTeaming Playground, an interactive interface that allows human red-teamers to author their own personas and co-create adversarial prompts with an AI assistant. In a user study with 11 industry practitioners, participants found the tool produced diverse, useful outputs and reported that AI-generated suggestions sparked creative thinking even when not strictly followed. The work advances both fully automated and human-in-the-loop red-teaming approaches, offering design insights for safer generative AI deployment.
- PersonaTeaming Workflow outperforms RainbowPlus in attack success rates while maintaining prompt diversity
- The system uses authored personas to simulate diverse human perspectives in adversarial prompt generation
- User study with 11 practitioners showed the Playground interface encouraged out-of-the-box thinking and useful outputs
Why It Matters
More effective AI red-teaming means safer models — and this method brings human context into automated testing.