Research & Papers

Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answering

A new AI technique specifically fine-tuned for financial filings solves a critical 'within-document' retrieval failure.

Deep Dive

Researchers Amine Kobeissi and Philippe Langlais have published a paper, 'Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answering,' that tackles a critical reliability problem in AI for finance. The study focuses on a frequent failure mode where RAG systems retrieve the correct long document (like a 10-K filing) but miss the exact page or text chunk containing the answer, forcing the AI to extrapolate incorrectly from incomplete context—a high-stakes error in financial analysis.

The team systematically evaluated retrieval at three granularities—document, page, and chunk—on a 150-question subset of FinanceBench. They compared dense, sparse, hybrid, and hierarchical retrieval methods with reranking. Their key innovation is a domain fine-tuned 'page scorer,' a bi-encoder model specifically trained to assess page-level relevance in financial documents. Unlike generic passage retrieval, this model exploits the inherent semantic coherence of pages within regulatory filings, acting as an intermediate retrieval unit between documents and smaller chunks.

This targeted approach addresses a gap identified by their 'oracle analysis,' which showed significant headroom for improvement at the page and chunk level, even when document retrieval succeeded. The practical implication is a substantial boost in the precision of AI systems used for financial due diligence, regulatory compliance checks, and investment research, where retrieving the exact justification is non-negotiable. The work provides a new, more reliable architectural pattern for enterprise RAG deployments handling complex, lengthy documents.

Key Points
  • Identifies a critical 'within-document' RAG failure where the right document is found but the specific answer-containing page/chunk is missed.
  • Introduces a domain fine-tuned 'page scorer' bi-encoder, treating pages as a retrieval unit to exploit semantic coherence in financial filings.
  • Demonstrates significant improvements in page recall and chunk retrieval on FinanceBench, providing a more reliable blueprint for enterprise QA systems.

Why It Matters

Directly improves the reliability of AI for high-stakes financial analysis, compliance, and research by ensuring answers are grounded in the correct evidence.