P2T uses the developer's reference patch as privileged information to build a latent process graph (G*) for scoring trajectory steps?

P2T uses the developer's reference patch as privileged information to build a latent process graph (G*) for scoring trajectory steps.

The method raises Pass@1 on SWE-bench Verified by up to 10.8 points and reduces per-instance inference cost by ~15%?

The method raises Pass@1 on SWE-bench Verified by up to 10.8 points and reduces per-instance inference cost by ~15%.

Only 1,800 curated instances are needed; gains are attributed to trajectory quality, not data volume?

Only 1,800 curated instances are needed; gains are attributed to trajectory quality, not data volume.

Developer Tools

P2T boosts SWE agents by 10.8 points using privileged patch supervision

arXiv cs.SE May 22, 2026

⚡New method uses the developer's own reference patch as a secret weapon for training…

Deep Dive

Supervised fine-tuning of open software-engineering (SWE) agents traditionally relies on long teacher trajectories, but students inherit both successful outcomes and intermediate flaws like ungrounded leaps or redundant loops. In a new arXiv paper, researchers from Microsoft and multiple universities introduce Patches-to-Trajectories (P2T), a training pipeline that puts the developer's own reference patch (p*) to work as privileged supervision. Instead of only binary pass/fail signals, P2T first reverse-engineers p* into a latent process graph (G*) of contextual facts and solution milestones, then scores each step of blinded teacher trajectories against that graph under a leakage-blocking groundedness check. The system retains only the shortest effective segments, optimizing both per-step effectiveness and total trajectory length.

Results are striking: using just 1,800 curated SWE-Gym instances, P2T outperforms outcome-filtered SFT and its tool-error-masking variant across two benchmarks. On SWE-bench Verified, it lifts Pass@1 by up to 10.8 percentage points while cutting per-instance inference cost by roughly 15%. The team also demonstrates consistent gains on SWE-bench Lite and isolates via size-matched ablations that the quality of the curated trajectories—not data scale—drives the improvement. For practitioners building code-fixing agents, P2T offers a practical way to extract far more signal from each training example by exploiting information that is typically discarded.

Key Points

P2T uses the developer's reference patch as privileged information to build a latent process graph (G*) for scoring trajectory steps.
The method raises Pass@1 on SWE-bench Verified by up to 10.8 points and reduces per-instance inference cost by ~15%.
Only 1,800 curated instances are needed; gains are attributed to trajectory quality, not data volume.

Why It Matters

This teaches AI coding agents from fewer, higher-quality examples—cutting costs and improving real-world bug-fixing accuracy.

Read Original Article

P2T boosts SWE agents by 10.8 points using privileged patch supervision

Why It Matters

Related Articles

🚀 Stay Ahead in AI