Sansa Benchmark: gpt-5.4 still among the most censored models
New testing shows OpenAI's GPT-5.4 scores just 0.417 on censorship resistance, far below competitors like Gemini 3 Pro.
AI infrastructure company Sansa has released comprehensive benchmark results from its latest testing of major language models, revealing significant differences in censorship resistance and overall performance. The benchmark, created by Sansa co-founder Joshua and his team, evaluates models across multiple categories including math, reasoning, coding, logic, physics, safety compliance, and censorship resistance. The most striking finding shows OpenAI's GPT-5.4 scoring just 0.417 on censorship resistance, placing it among the most restricted frontier models available—only slightly better than GPT-5.2, which previously held the lowest score. This continues a trend where GPT models consistently show higher censorship levels compared to competitors.
Google's Gemini 3.1 Pro emerged as the best overall performer in the testing, while Gemini 3.1 Flash Lite was highlighted as a cost-effective alternative to GPT-5.4 with nearly equivalent performance. Interestingly, the newer Gemini 3.1 models actually scored below the previous Gemini 3 generation on censorship resistance, suggesting big labs may be converging toward more moderate positions. For open-source options, Kimi 2.5 was identified as the strongest performer. The benchmark also revealed that Claude Sonnet 4.5 and 4.6 models without reasoning capabilities tend toward more censored responses than their reasoning-enabled variants, providing valuable insights for developers choosing between model configurations.
- GPT-5.4 scores only 0.417 on censorship resistance, making it one of the most restricted frontier models
- Gemini 3.1 Pro ranked as best overall performer, while Kimi 2.5 was top open-source model
- Claude Sonnet models without reasoning capabilities showed more censorship than reasoning-enabled versions
Why It Matters
These benchmarks help developers choose AI models based on specific needs, balancing safety requirements against creative freedom and cost.