Open Source

Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!

The dataset contains 32,000+ executable tasks across 20 languages, derived from real-world GitHub issues.

Deep Dive

Nebius, a cloud and AI infrastructure company, has launched SWE-rebench-V2, a massive open-source dataset specifically engineered for training reinforcement learning (RL) agents in code generation and software engineering tasks. This release marks a significant scale-up from previous benchmarks, moving beyond Python-centric datasets to include 20 programming languages and over 32,000 executable tasks, each based on a real-world GitHub issue and packaged with a ready-to-run Docker environment. The dataset is positioned to address a critical bottleneck in AI research: the lack of large, high-quality, and executable environments needed to train sophisticated code agents that can interact with real systems.

Technically, the dataset was created using an automated pipeline that extracts RL environments at scale, with quality ensured by an LLM ensemble for filtering and labeling. It includes 120,000+ additional tasks derived from real pull requests, all enriched with metadata and tested interfaces to guarantee solvability. The release is accompanied by a detailed technical report. For the developer community, this provides a standardized, multilingual playground to train and benchmark models like GPT-4, Claude 3.5, and open-source alternatives, accelerating progress toward more capable and generalist AI coding assistants. Nebius also confirmed ongoing development of its associated SWE-rebench leaderboard, inviting collaboration via Discord.

Key Points
  • Contains 32,000+ executable tasks, each with a pre-built Docker environment based on real GitHub issues.
  • Supports 20 programming languages, expanding beyond Python to include less common ones like Lua and Clojure.
  • Includes 120,000+ extra tasks from pull requests, filtered and labeled for quality using an LLM ensemble.

Why It Matters

Provides the large-scale, executable training data needed to build the next generation of AI coding assistants and agents.