Media & Culture

New Benchmark "InsanityBench", Gemini 3.1 Pro scores 15%

New benchmark measures 'insane' scientific creativity, with top AI models failing spectacularly.

Deep Dive

A new benchmark called InsanityBench is challenging AI models on their capacity for scientific creativity, with even the best performers scoring dismally low. Created by researcher Robin Haselhorst, the benchmark specifically measures what he calls 'insane' creativity—the kind of breakthrough, non-obvious thinking that drives major scientific discoveries. Unlike traditional benchmarks that can be gamed through pattern recognition, InsanityBench features completely unique tasks that require genuine novel reasoning.

Google's Gemini 3.1 Pro, one of the most capable models available, currently tops the leaderboard with just a 15% score, demonstrating how far current AI systems are from true creative reasoning. The benchmark's design ensures tasks are sufficiently different from one another to prevent memorization or pattern-matching approaches that dominate other evaluations. This low saturation point (15% for the best model) indicates the benchmark will remain challenging even as models improve.

The implications are significant for scientific research applications. While current AI excels at data analysis and literature review, InsanityBench suggests these systems struggle with the kind of creative leaps that characterize breakthrough discoveries. This creates a clear distinction between AI as a research assistant versus AI as a research partner capable of original thought. The benchmark's existence pushes developers to move beyond pattern recognition toward systems that can genuinely innovate.

Key Points
  • Gemini 3.1 Pro scores only 15% on new creativity benchmark
  • InsanityBench measures 'insane' scientific creativity with unique, non-gameable tasks
  • Benchmark reveals gap between AI pattern recognition and genuine creative reasoning

Why It Matters

Reveals current AI's limitations in scientific innovation, pushing development toward true creative reasoning systems.