Research & Papers

DCRC framework slashes numerical hallucinations in financial QA by 40%

Data-centric compilation stops LLMs from fumbling numbers in finance Q&A.

Deep Dive

Large Language Models (LLMs) powering financial question-answering (FinQA) systems still suffer from numerical reasoning hallucinations. A new paper by Hao Chen, Xing Tang, and colleagues introduces the Data-centric Reasoning Compiler (DCRC), a framework that shifts focus from model-centric optimization to data-centric compilation. DCRC operates in three phases: adversarial data construction creates training examples with controlled noise; multi-stage training cultivates a Data-centric Structuring Agent (DSA) that performs explicit evidence auditing and program synthesis; and a compile-and-execute inference process transforms user queries and retrieved documents into verifiable, executable reasoning programs. This ensures faithful numerical reasoning by design, addressing noise sensitivity, calculation fragility, and auditability crises in existing retrieval-augmented generation (RAG) pipelines.

Extensive experiments on offline benchmarks and deployment in a real-world online FinQA system show DCRC significantly reduces numerical hallucination rates—by over 40% on complex multi-step calculations—while maintaining response latency under 200ms. The framework's explicit auditing step makes outputs fully traceable, a critical requirement for financial compliance. By treating reasoning as compilation rather than generation, DCRC offers a principled path to trustworthiness in high-stakes AI applications. The work was accepted at KDD 2026's Applied Data Science track.

Key Points
  • DCRC reduces numerical hallucinations by 40%+ on financial QA benchmarks compared to standard RAG.
  • Three-phase data-centric approach: adversarial noise training, multi-stage DSA agent, compile-and-execute inference.
  • Deployed in production with <200ms latency; explicit auditing ensures full traceability for compliance.

Why It Matters

Trustworthy AI for finance: DCRC makes LLM calculations auditable and accurate, enabling safer deployment in trading and compliance.