Developer Tools

P2T boosts SWE agents by 10.8 points using privileged patch supervision

New method uses the developer's own reference patch as a secret weapon for training…

Deep Dive

Supervised fine-tuning of open software-engineering (SWE) agents traditionally relies on long teacher trajectories, but students inherit both successful outcomes and intermediate flaws like ungrounded leaps or redundant loops. In a new arXiv paper, researchers from Microsoft and multiple universities introduce Patches-to-Trajectories (P2T), a training pipeline that puts the developer's own reference patch (p*) to work as privileged supervision. Instead of only binary pass/fail signals, P2T first reverse-engineers p* into a latent process graph (G*) of contextual facts and solution milestones, then scores each step of blinded teacher trajectories against that graph under a leakage-blocking groundedness check. The system retains only the shortest effective segments, optimizing both per-step effectiveness and total trajectory length.

Results are striking: using just 1,800 curated SWE-Gym instances, P2T outperforms outcome-filtered SFT and its tool-error-masking variant across two benchmarks. On SWE-bench Verified, it lifts Pass@1 by up to 10.8 percentage points while cutting per-instance inference cost by roughly 15%. The team also demonstrates consistent gains on SWE-bench Lite and isolates via size-matched ablations that the quality of the curated trajectories—not data scale—drives the improvement. For practitioners building code-fixing agents, P2T offers a practical way to extract far more signal from each training example by exploiting information that is typically discarded.

Key Points
  • P2T uses the developer's reference patch as privileged information to build a latent process graph (G*) for scoring trajectory steps.
  • The method raises Pass@1 on SWE-bench Verified by up to 10.8 points and reduces per-instance inference cost by ~15%.
  • Only 1,800 curated instances are needed; gains are attributed to trajectory quality, not data volume.

Why It Matters

This teaches AI coding agents from fewer, higher-quality examples—cutting costs and improving real-world bug-fixing accuracy.