Creates a dataset of 32,000+ executable SWE tasks with pre-built environments for RL training?

Creates a dataset of 32,000+ executable SWE tasks with pre-built environments for RL training

Supports 20 programming languages across 3,600+ repositories, breaking the Python/JS dominance?

Supports 20 programming languages across 3,600+ repositories, breaking the Python/JS dominance

Uses an interactive setup agent and LLM judge ensemble to automate task harvesting and validation?

Uses an interactive setup agent and LLM judge ensemble to automate task harvesting and validation

Developer Tools

SWE-rebench V2 scales AI coding training with 32,000+ tasks across 20 languages

arXiv cs.SE March 02, 2026

⚡Researchers release a massive dataset of 32,000+ real-world coding tasks to train the next generation of software engineering AI agents.

Deep Dive

A research team led by Ibragim Badertdinov, Maksim Nekrashevich, Anton Shevtsov, and Alexander Golubev has released SWE-rebench V2, a major new resource for training software engineering AI agents. The core problem they address is the scarcity of large-scale, executable task collections needed for effective reinforcement learning (RL) training. While benchmarks for evaluation exist, datasets suitable for training remain limited in scale, diversity, and language support. SWE-rebench V2 introduces an automated pipeline that harvests real-world tasks from GitHub repositories, synthesizes installation and test procedures via an interactive agent, and filters unsound instances using an ensemble of LLM judges validated against human annotations.

The pipeline has produced two key datasets: a high-quality set of 32,000+ tasks with pre-built Docker images for reproducible execution, and an extended set of 120,000+ tasks with installation instructions and metadata. These tasks span 20 programming languages and over 3,600 repositories, dramatically expanding beyond the typical focus on high-resource languages like Python. The release includes the collection code, execution environments, and instance-level metadata flagging common issues like overly restrictive tests. This provides the foundational data needed to train more capable and generalist SWE agents that can operate across diverse codebases and tech stacks, moving AI-assisted coding beyond simple code completion to handling complex, real-world software engineering workflows.

Key Points

Creates a dataset of 32,000+ executable SWE tasks with pre-built environments for RL training
Supports 20 programming languages across 3,600+ repositories, breaking the Python/JS dominance
Uses an interactive setup agent and LLM judge ensemble to automate task harvesting and validation

Why It Matters

Provides the massive, diverse training data needed to build AI agents that can fix bugs and implement features in real-world, multi-language codebases.

Read Original Article

SWE-rebench V2 scales AI coding training with 32,000+ tasks across 20 languages

Why It Matters

Related Articles

🚀 Stay Ahead in AI