Research & Papers

Red-Teaming LLMs reveals left-leaning bias and jailbreak risks in 30+ open-source models

Open-source LLMs show political asymmetries – researchers expose how jailbreaks widen the Overton Window.

Deep Dive

Researchers red-teamed more than 30 open-source LLMs across 10 model families and five countries of origin for political influence campaigns. They introduced Overton Windows (OWs) to measure the range of political opinions models can express. They found systematic asymmetries in political expressivity: models are typically more willing to generate left-leaning social media content, and OWs tend to contract inversely to model size (larger models have narrower windows). Jailbreak potency varies sharply across model families, motivating a workflow for auditing political steerability and designing countermeasures.

Key Points
  • Evaluated over 30 open-source LLMs from 10 model families (Llama, Mistral, Qwen, Gemma, Falcon, etc.) across 5 countries of origin.
  • Systematic left-leaning bias observed: models more willing to generate left-leaning social media content; smaller models have narrower political opinion ranges.
  • Jailbreak effectiveness varies sharply by model family, enabling a structured workflow that identifies optimal jailbreak combinations for each model.

Why It Matters

Essential auditing framework to safeguard online information integrity against LLM-powered influence campaigns.