Developer Tools

EvidenT framework repairs 53% of system-level package build failures

New framework beats LLM-based repair by 2.6x on 219 RISC-V package failures.

Deep Dive

A team of Microsoft researchers and university collaborators has released EvidenT, a new framework designed to tackle the stubborn problem of system-level package repair. Unlike project-level source fixes that LLMs handle well, system-level failures involve multi-language artifacts like build recipes, scripts, and source archives, and require iterative validation through external build services. The team's empirical study of real-world failures found that 72% stem from dependency and environment misconfigurations rather than isolated code defects—a key insight driving EvidenT's design.

EvidenT introduces three core components: an external Build Service for reproducible execution and feedback, an Evidence-Preserving Repair Controller that fuses repair history, knowledge context, and build artifacts, and an automated Repair Orchestrator that invokes modular tools in a closed-loop validation environment. Tested on 219 real-world RISC-V package build failures, EvidenT repaired 118 packages (53.88%), dramatically outperforming state-of-the-art agentic baselines (20.55%) and direct LLM-based repair (1.83%). The framework also demonstrates cross-architecture robustness, achieving 41.77% on aarch64 and 46.99% on x86_64 with only ISA-specific knowledge updates. This work represents a significant step toward automated, reliable system-level package repair across diverse hardware ecosystems.

Key Points
  • EvidenT repaired 53.88% of 219 real-world RISC-V package build failures, vs 20.55% for agentic baselines and 1.83% for direct LLM repair
  • 72% of system-level failures come from dependency and environment misconfigurations, not code defects
  • Cross-architecture tests show 41.77% (aarch64) and 46.99% (x86_64) success rates with only ISA-specific knowledge updates

Why It Matters

Automates painful system-level package repairs that stymie developers, reducing build failures across multiple CPU architectures.