New filter cuts LLM context waste by 89% with near-zero overhead
79.6% token reduction at 0.30 ms – no indexing required.
A new arXiv paper by Shweta Mishra tackles a critical bottleneck in LLM-powered developer tools: context window inefficiency. Paulsen's earlier work showed that models degrade well before reaching their advertised context limits (Maximum Effective Context Window), making context construction a quality problem. Modern repositories often contain massive non-code artifacts – compiled datasets, model weights, minified JS bundles, gigabyte logs – that push out relevant source code.
Mishra's solution is a lightweight, correctness-aware context hygiene framework that runs before tokenization. It uses only OS-level stat() metadata, requiring no indexing or semantic retrieval (unlike RepoCoder, GraphRAG, or AST-based chunking). The SizeFilter at a 1 MB threshold achieves 79.6% mean token reduction at 0.30 ms overhead across 10 open-source repos (22,046 files, 5 languages). A HybridFilter reaches 89.3% reduction with the lowest variance of any evaluated method.
In a limited evaluation with CodeLlama-7B-Instruct (18 tasks), the filter boosted file-level accuracy from 25% to 72% and slashed hallucination frequency from 61% to 17%. A token-density study across 2,688 files confirmed a near-perfect linear correlation (Pearson r=0.997, 0.250 tokens/byte). All code and data are released for reproducibility.
- SizeFilter achieves 79.6% token reduction at <0.01 ms per file decision using only OS stat() metadata.
- HybridFilter reaches 89.3% reduction with lowest variance; no indexing needed unlike RepoCoder or GraphRAG.
- Accuracy improved from 25% to 72% and hallucinations dropped from 61% to 17% in 18-task evaluation with CodeLlama-7B.
Why It Matters
A lightweight, no-index filter that dramatically cuts context waste and boosts LLM accuracy in real-world code repos.