SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Researchers release a massive dataset of 32,000+ real-world coding tasks to train the next generation of software engineering AI agents.
A research team led by Ibragim Badertdinov, Maksim Nekrashevich, Anton Shevtsov, and Alexander Golubev has released SWE-rebench V2, a major new resource for training software engineering AI agents. The core problem they address is the scarcity of large-scale, executable task collections needed for effective reinforcement learning (RL) training. While benchmarks for evaluation exist, datasets suitable for training remain limited in scale, diversity, and language support. SWE-rebench V2 introduces an automated pipeline that harvests real-world tasks from GitHub repositories, synthesizes installation and test procedures via an interactive agent, and filters unsound instances using an ensemble of LLM judges validated against human annotations.
The pipeline has produced two key datasets: a high-quality set of 32,000+ tasks with pre-built Docker images for reproducible execution, and an extended set of 120,000+ tasks with installation instructions and metadata. These tasks span 20 programming languages and over 3,600 repositories, dramatically expanding beyond the typical focus on high-resource languages like Python. The release includes the collection code, execution environments, and instance-level metadata flagging common issues like overly restrictive tests. This provides the foundational data needed to train more capable and generalist SWE agents that can operate across diverse codebases and tech stacks, moving AI-assisted coding beyond simple code completion to handling complex, real-world software engineering workflows.
- Creates a dataset of 32,000+ executable SWE tasks with pre-built environments for RL training
- Supports 20 programming languages across 3,600+ repositories, breaking the Python/JS dominance
- Uses an interactive setup agent and LLM judge ensemble to automate task harvesting and validation
Why It Matters
Provides the massive, diverse training data needed to build AI agents that can fix bugs and implement features in real-world, multi-language codebases.