Models & Releases

Sansa Benchmark: gpt-5.4 still among the most censored models

r/OpenAI March 12, 2026

⚡New testing shows OpenAI's GPT-5.4 scores just 0.417 on censorship resistance, far below competitors like Gemini 3 Pro.

Deep Dive

AI infrastructure company Sansa has released comprehensive benchmark results from its latest testing of major language models, revealing significant differences in censorship resistance and overall performance. The benchmark, created by Sansa co-founder Joshua and his team, evaluates models across multiple categories including math, reasoning, coding, logic, physics, safety compliance, and censorship resistance. The most striking finding shows OpenAI's GPT-5.4 scoring just 0.417 on censorship resistance, placing it among the most restricted frontier models available—only slightly better than GPT-5.2, which previously held the lowest score. This continues a trend where GPT models consistently show higher censorship levels compared to competitors.

Google's Gemini 3.1 Pro emerged as the best overall performer in the testing, while Gemini 3.1 Flash Lite was highlighted as a cost-effective alternative to GPT-5.4 with nearly equivalent performance. Interestingly, the newer Gemini 3.1 models actually scored below the previous Gemini 3 generation on censorship resistance, suggesting big labs may be converging toward more moderate positions. For open-source options, Kimi 2.5 was identified as the strongest performer. The benchmark also revealed that Claude Sonnet 4.5 and 4.6 models without reasoning capabilities tend toward more censored responses than their reasoning-enabled variants, providing valuable insights for developers choosing between model configurations.

Key Points

GPT-5.4 scores only 0.417 on censorship resistance, making it one of the most restricted frontier models
Gemini 3.1 Pro ranked as best overall performer, while Kimi 2.5 was top open-source model
Claude Sonnet models without reasoning capabilities showed more censorship than reasoning-enabled versions

Why It Matters

These benchmarks help developers choose AI models based on specific needs, balancing safety requirements against creative freedom and cost.

Read Original Article

Sansa Benchmark: gpt-5.4 still among the most censored models

Why It Matters

Stay Ahead in AI