daVinci-Env: Open SWE Environment Synthesis at Scale
A $1.47M open-source project yields 13k curated coding trajectories for training superior software engineering agents.
A team of academic researchers has introduced daVinci-Env (branded as OpenSWE), a groundbreaking open-source framework designed to solve a critical bottleneck in AI research: the lack of large-scale, executable environments for training software engineering (SWE) agents. Existing datasets are limited, and industrial solutions are proprietary, creating a high barrier for academic labs. OpenSWE tackles this by providing 45,320 fully executable Docker environments sourced from over 12,800 Python repositories, with all Dockerfiles, evaluation scripts, and infrastructure publicly released for full reproducibility. The entire framework was constructed using a sophisticated multi-agent synthesis pipeline deployed across a 64-node distributed computing cluster, automating tasks like repository exploration and Dockerfile generation.
Beyond sheer scale, the project's core innovation is a quality-centric filtering pipeline that characterizes the inherent difficulty of each coding task. This process, which cost approximately $576,000 for trajectory sampling and curation, filters out problems that are either trivial or unsolvable, retaining only about 9,000 high-quality, challenging environments. This curation yielded roughly 13,000 high-quality training trajectories. The investment in quality pays off: models like OpenSWE-32B and OpenSWE-72B, fine-tuned on this data, achieve state-of-the-art scores of 62.4% and 66.0% respectively on the SWE-bench Verified benchmark. Remarkably, this SWE-focused training also leads to substantial 'out-of-domain' improvements, boosting performance on mathematical reasoning by up to 12 points and on science benchmarks by 5 points, without harming factual recall capabilities.
- Creates 45,320 executable Docker environments from 12.8k Python repos, fully open-sourcing all infrastructure for reproducibility.
- Uses a $1.47M multi-agent pipeline on a 64-node cluster to automate environment synthesis and a quality filter for difficulty-aware curation.
- Models trained on the data (OpenSWE-72B) score 66.0% on SWE-bench and gain up to 12 points on math reasoning, showing strong generalization.
Why It Matters
Democratizes high-quality SWE agent training for academia, potentially accelerating the development of AI that can reliably write and debug complex code.