Red-Teaming LLMs reveals left-leaning bias and jailbreak risks in 30+ open-source models
Open-source LLMs show political asymmetries – researchers expose how jailbreaks widen the Overton Window.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Researchers red-teamed more than 30 open-source LLMs across 10 model families and five countries of origin for political influence campaigns. They introduced Overton Windows (OWs) to measure the range of political opinions models can express. They found systematic asymmetries in political expressivity: models are typically more willing to generate left-leaning social media content, and OWs tend to contract inversely to model size (larger models have narrower windows). Jailbreak potency varies sharply across model families, motivating a workflow for auditing political steerability and designing countermeasures.
- Evaluated over 30 open-source LLMs from 10 model families (Llama, Mistral, Qwen, Gemma, Falcon, etc.) across 5 countries of origin.
- Systematic left-leaning bias observed: models more willing to generate left-leaning social media content; smaller models have narrower political opinion ranges.
- Jailbreak effectiveness varies sharply by model family, enabling a structured workflow that identifies optimal jailbreak combinations for each model.
Why It Matters
Essential auditing framework to safeguard online information integrity against LLM-powered influence campaigns.