Media & Culture

Nebius AI R&D released SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!

The 1.5M sample dataset is multilingual, executable, and designed to benchmark real-world coding performance.

Deep Dive

Nebius AI R&D has launched SWE-rebench-V2, positioning it as the largest open, multilingual, and executable dataset specifically designed for training and benchmarking AI-powered code agents. This release addresses a critical gap in AI development for software engineering, where most existing datasets are static and lack the execution context needed to evaluate an agent's ability to produce functional, working code. By providing a massive corpus of executable tasks, Nebius aims to shift the benchmark from simple code completion to evaluating an AI's capacity to solve real-world programming problems, debug errors, and produce verifiably correct solutions.

The dataset contains approximately 1.5 million samples spanning over 10 programming languages, including mainstream ones like Python, Java, and C++, as well as others like Go and Rust. Its key differentiator is that each sample is executable, allowing models to be trained and tested in environments that mirror actual developer workflows. This enables the development of more robust 'coding agents'—AI systems that can autonomously take actions like running tests, fixing bugs, or implementing features. For the AI research community, this provides a standardized, high-quality benchmark to measure progress in a crucial applied domain, potentially accelerating the development of practical AI pair programmers and autonomous software engineers.

Key Points
  • Contains 1.5 million executable code samples for training AI agents
  • Supports 10+ programming languages including Python, Java, C++, Go, and Rust
  • Designed to benchmark real-world performance, moving beyond static code completion

Why It Matters

Provides the foundational data needed to build AI that can autonomously write, test, and debug real software.