Developer Tools

From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents

New two-stage training method creates open-source coding agents that outperform most proprietary models on real-world engineering tasks.

Deep Dive

NVIDIA researchers have developed a breakthrough training methodology called SWE-ZERO to SWE-HERO that creates highly capable software engineering AI agents from open-weight models. The two-stage pipeline first uses execution-free trajectories to teach code semantics and repository reasoning at scale, then applies targeted execution-backed refinement to convert those semantic understandings into rigorous engineering workflows. This evolutionary approach replaces resource-heavy dependencies with a more efficient distillation process from frontier models like Qwen3-Coder-480B.

The resulting SWE-HERO-32B model achieves a remarkable 62.2% resolution rate on SWE-bench Verified, setting a new benchmark for open-source models of comparable size. Even more impressive is the model's zero-shot transferability: despite being trained exclusively on Python, it reaches 44.1% on SWE-bench Multilingual, demonstrating the paradigm's generalizability across programming languages. The team is releasing a substantial dataset of 300k SWE-ZERO and 13k SWE-HERO trajectories alongside agent suites based on the Qwen2.5-Coder series.

This research represents a significant step toward practical AI software engineers that can handle real-world development tasks. The execution-based refinement approach bridges the gap between theoretical code understanding and practical implementation, creating agents that don't just write code but can actually fix complex software issues in existing codebases. The open-source release of both methodology and datasets enables broader community development of capable coding assistants.

Key Points
  • SWE-HERO-32B achieves 62.2% resolution rate on SWE-bench Verified, outperforming most proprietary models
  • Two-stage training uses 300k execution-free trajectories followed by 13k execution-backed refinements
  • Demonstrates 44.1% zero-shot performance on multilingual benchmarks despite Python-only training

Why It Matters

Enables creation of open-source AI software engineers that can actually fix real bugs in production codebases.