Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals
New research shows code quality metrics can cut AI coding costs by 40% without sacrificing output.
A new research paper titled 'Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals' introduces a framework that could significantly reduce the cost of using AI coding assistants. Developed by researcher Lech Madeyski, Triage addresses the common industry problem where AI coding agents default to using expensive frontier models like GPT-4 or Claude Opus for every task, even routine ones. The system uses code health metrics—indicators of software maintainability—as routing signals to assign each development task to the cheapest model tier capable of producing acceptable output.
Triage defines three capability tiers (light, standard, heavy) that correspond to model families like Anthropic's Haiku, Sonnet, and Opus. The framework analyzes pre-computed code health sub-factors and task metadata to determine which tier can handle a given task while passing the same verification gate as more expensive models. The research establishes two key conditions for cost-effective routing: the light-tier model must achieve a pass rate on healthy code that exceeds the inter-tier cost ratio, and code health must discriminate the required model tier with at least a small effect size (p ≥ 0.56).
The evaluation, conducted on SWE-bench Lite with 300 tasks across three model tiers, compared three routing policies: heuristic thresholds, a trained ML classifier, and a perfect-hindsight oracle. This rigorous protocol allows teams to test cost-quality trade-offs and identify which specific code health factors drive routing decisions. By transforming diagnostic code quality metrics into actionable model-selection signals, Triage offers a practical approach to optimizing AI development costs without compromising on output quality.
- Routes tasks to 3 LLM tiers (light/standard/heavy) using code health metrics as signals
- Establishes mathematical conditions for cost savings: light-tier pass rate must exceed cost ratio
- Evaluated on 300 SWE-bench Lite tasks, showing potential for significant inference cost reduction
Why It Matters
Enables development teams to cut AI coding costs by 40%+ while maintaining output quality through intelligent model routing.