A hybrid retrieval + neural reranking pipeline achieved top scores (Recall@5?

0.816, MRR@3: 0.605) on a 23,088-query financial benchmark.

The BM25 algorithm outperformed modern dense retrieval methods, challenging the assumption that semantic search is always better?

The BM25 algorithm outperformed modern dense retrieval methods, challenging the assumption that semantic search is always better.

Query expansion methods like HyDE showed limited value for precise numerical queries, while contextual retrieval improvements were more reliable?

Query expansion methods like HyDE showed limited value for precise numerical queries, while contextual retrieval improvements were more reliable.

Research & Papers

Researchers benchmark 10 RAG strategies, find BM25 beats dense retrieval for financial data

arXiv cs.IR April 03, 2026

⚡A new study of 23,000 queries shows a hybrid retrieval + reranking pipeline achieves 0.816 Recall@5.

Deep Dive

A team of researchers has published a comprehensive benchmark comparing modern retrieval methods for RAG systems that handle documents with mixed text and tabular data, a common format in finance, science, and business. Their study, "From BM25 to Corrective RAG," evaluated ten strategies—including sparse retrieval (BM25), dense retrieval, hybrid fusion, cross-encoder reranking, and adaptive methods—on a challenging dataset of 23,088 queries over 7,318 financial documents. The key finding is that a two-stage pipeline, which first uses hybrid retrieval to get candidate documents and then applies a neural reranker to reorder them, delivered the best performance with a Recall@5 of 0.816 and an MRR@3 of 0.605, significantly outperforming any single-stage method.

One of the most surprising results challenges prevailing wisdom in AI: the classic BM25 algorithm, a statistical keyword-matching method, outperformed state-of-the-art dense retrieval (semantic search) on this financial QA benchmark. This indicates that for precise numerical and factual queries in structured domains, semantic understanding isn't always superior to traditional keyword search. The study also found that popular query expansion techniques like HyDE provided limited benefit for these precise queries, while adding contextual information to the retrieval index yielded more consistent gains. The authors provide actionable recommendations for balancing cost and accuracy and have released their full benchmark code to help practitioners build more effective RAG systems for tabular data.

Key Points

A hybrid retrieval + neural reranking pipeline achieved top scores (Recall@5: 0.816, MRR@3: 0.605) on a 23,088-query financial benchmark.
The BM25 algorithm outperformed modern dense retrieval methods, challenging the assumption that semantic search is always better.
Query expansion methods like HyDE showed limited value for precise numerical queries, while contextual retrieval improvements were more reliable.

Why It Matters

This provides data-driven architecture guidance for developers building accurate RAG systems for finance, reports, and any domain mixing text with tables.

Read Original Article

Researchers benchmark 10 RAG strategies, find BM25 beats dense retrieval for financial data

Why It Matters

Related Articles

🚀 Stay Ahead in AI