Developer Tools

SWE-TRACE: Optimizing Long-Horizon SWE Agents Through Rubric Process Reward Models and Heuristic Test-Time Scaling

New AI agent framework slashes token use 40% while solving complex software bugs faster.

Deep Dive

A research team led by Hao Han and Hongkai Chen has introduced SWE-TRACE (Trajectory Reduction and Agentic Criteria Evaluation), a comprehensive framework designed to overcome key bottlenecks in autonomous AI coding agents. Current systems struggle with inefficient demonstration data, sparse rewards that only evaluate final outcomes, and computationally heavy inference methods that lead to token bloat and policy degradation. SWE-TRACE addresses these through a three-pronged approach: first, it uses LLM multi-task cascading with stepwise verification to create a 60,000-instance Supervised Fine-Tuning dataset biased toward the most token-efficient, shortest-path solutions.

Second, to provide better guidance during long, complex coding tasks, the team developed a Memory-Augmented Agentic Reinforcement Learning pipeline featuring a Rubric-Based Process Reward Model (PRM). This employs an auxiliary 'Rubric-Agent' that gives dense, fine-grained feedback on intermediate steps, preventing the AI from getting stuck or pursuing inefficient paths. Finally, the framework bridges training and inference by repurposing the PRM for heuristic-guided Test-Time Scaling (TTS), which dynamically evaluates and prunes action candidates at each decision point. This eliminates the latency overhead of standard parallel sampling while maintaining search efficiency.

Extensive testing on established software engineering benchmarks shows SWE-TRACE significantly advances the state-of-the-art. The system maximizes problem resolution rates while drastically cutting both token consumption—a major cost factor—and inference latency. This makes autonomous coding agents more practical for real-world deployment where efficiency and accuracy are both critical.

Key Points
  • Uses a 60K-instance SFT corpus distilled via LLM cascading to focus on shortest-path, token-efficient coding trajectories
  • Introduces a Rubric-Based Process Reward Model with an auxiliary agent for dense, step-by-step feedback on long-horizon tasks
  • Applies heuristic Test-Time Scaling to dynamically prune action candidates, reducing token use and latency without sacrificing performance

Why It Matters

Makes AI coding assistants more efficient and reliable for complex software engineering tasks, reducing operational costs and improving developer productivity.