Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents
A new model cuts 92% of irrelevant code output, making AI coding assistants 11% more accurate.
Researcher Ádám Kovács has introduced Squeez, a novel AI system designed to solve a critical inefficiency in automated coding assistants. Current coding agents waste significant computational resources by processing lengthy, unfiltered outputs from tools like linters, test runners, or static analyzers, even though only a small fraction of that information is relevant to the next step. Squeez tackles this via 'task-conditioned tool-output pruning.' Given a specific coding task and a raw tool output, its sole job is to extract the smallest, most relevant verbatim block of evidence the agent should inspect next, dramatically reducing cognitive load and processing time.
The system was trained and evaluated on a substantial new benchmark of 11,477 examples derived from real SWE-bench repository interactions and synthetic multi-ecosystem outputs, with a manually curated 618-example test set. The technical approach involved fine-tuning a relatively small 2-billion-parameter Qwen 3.5 model using the parameter-efficient LoRA technique. The results are striking: Squeez achieves a high recall of 0.86 and an F1 score of 0.80 while pruning away a massive 92% of the input tokens on average. This performance notably surpasses a much larger, zero-shot 35-billion-parameter Qwen model by 11 points in recall and outperforms all heuristic pruning baselines by a wide margin.
This research matters because it directly addresses the scalability and cost of running sophisticated AI coding agents. By filtering out 92% of the noise, Squeez enables agents to operate faster, consume less computational power, and potentially make more precise decisions by focusing only on the signal. It represents a shift from simply using larger models to making existing agent architectures smarter and more efficient through targeted, learned filtering mechanisms. The release of the benchmark also provides a valuable new resource for the community to measure and improve upon this specific capability.
- Prunes 92% of input tokens from tool outputs while maintaining 0.86 recall.
- Outperforms a zero-shot Qwen 3.5 35B model by 11 points in recall, despite using a much smaller 2B model.
- Trained on a new benchmark of 11,477 examples built from SWE-bench and synthetic data.
Why It Matters
Makes AI coding assistants significantly faster and cheaper to run by eliminating irrelevant data processing.