Research & Papers

VeRA: Verified Reasoning Data Augmentation at Scale

New method creates infinite, verified test questions to stop AI models from memorizing answers.

Deep Dive

Researchers propose VeRA, a framework that converts static benchmark problems into executable specifications to generate unlimited, verified test variants. It creates fresh, difficult tasks at near-zero cost without human involvement, preventing models from exploiting memorized patterns. Testing 16 frontier models, VeRA-E exposed contamination, while VeRA-H generated reliably labeled hard tasks. This shifts benchmarks from static objects to on-demand generators, aiming to make AI evaluation more robust and scalable. All code is open-sourced.

Why It Matters

This could finally stop AI models from 'cheating' on benchmarks, forcing real progress in reasoning and intelligence.