Retrieval similarity threshold (0.7 cosine) blocked casual queries; logging context revealed zero docs returned, not an LLM issue?

Retrieval similarity threshold (0.7 cosine) blocked casual queries; logging context revealed zero docs returned, not an LLM issue.

LLM judge (Claude Haiku) gave meaningful scores for a few cents per run, unlike misleading keyword matching?

LLM judge (Claude Haiku) gave meaningful scores for a few cents per run, unlike misleading keyword matching.

Gemma 4 26B scored 7.88 vs original 7.33, costing 75% less per session.

Open Source

Neo AI Engineer evaluation reveals RAG tweaks boost quality 19% and cut costs 79%

r/LocalLLaMA May 15, 2026

⚡Most expensive model performed worst; retrieval & deduplication mattered more.

Deep Dive

A common RAG pitfall: retrieval problems disguised as LLM problems. The bot's similarity threshold (cosine distance 0.7) was too strict, returning zero documents for casual queries. Logging the context revealed no retrieval, not a model flaw. Heuristic keyword evaluators gave false confidence; switching to an LLM judge (Claude Haiku via OpenRouter) scoring relevance and accuracy for a few cents per run provided real signal.

Deduplicating chunks with >80% token overlap cleaned the context, eliminating hallucination. Stricter grounding (only facts from docs) improved accuracy but reduced helpfulness on knowledge gaps — a deliberate design choice. Model sweep: Gemma 4 26B outperformed Gemini 3.1 Flash Lite Preview (7.88 vs 7.33) at 75% lower cost. End result: 19% quality gain and 79% cost reduction, achieved through systematic evaluation with Neo AI Engineer.

Key Points

Retrieval similarity threshold (0.7 cosine) blocked casual queries; logging context revealed zero docs returned, not an LLM issue.
LLM judge (Claude Haiku) gave meaningful scores for a few cents per run, unlike misleading keyword matching.
Model sweep: Gemma 4 26B scored 7.88 vs original 7.33, costing 75% less per session.

Why It Matters

Practical RAG improvements — better retrieval, evaluation, and model selection — deliver 19% quality gain with 79% cost reduction.

Read Original Article

Neo AI Engineer evaluation reveals RAG tweaks boost quality 19% and cut costs 79%

Why It Matters

Related Articles

🚀 Stay Ahead in AI