Agent Frameworks

FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking

3-5x faster, 20-50x cheaper, and deployed at 40+ financial institutions.

Deep Dive

A team of researchers (including Denys Katerenchuk, Pablo Duboue, and others) has published FinRAG-12B, a production-validated recipe for grounded question answering in banking. The 12B parameter model addresses critical industry demands: high accuracy, regulatory compliance, and verifiable responses. The training pipeline combines LLM-as-a-Judge filtering, citation annotation, and curriculum learning using only 143M tokens—a fraction of typical datasets. The resulting model outperforms GPT-4.1 on citation grounding and introduces a calibrated refusal mechanism: by training on 22% unanswerable examples, the model achieves a 12% refusal rate, improving on the base model's unsafe 4.3% while avoiding GPT-4.1's over-refusal at 20.2%.

The system is already deployed at over 40 financial institutions, achieving a statistically significant 7.1 percentage point improvement in query resolution. Beyond accuracy, FinRAG-12B offers dramatic cost and speed advantages: it delivers 3-5x faster responses at 20-50x lower cost compared to GPT-4.1. The end-to-end methodology covers everything from data curation to quantized serving, making it a practical blueprint for regulated industries. This work was submitted to ACL 2026 and provides a compelling case for specialized, grounded LLMs over general-purpose frontier models in high-stakes domains.

Key Points
  • FinRAG-12B outperforms GPT-4.1 on citation grounding using only 143M tokens for training.
  • Calibrated refusal mechanism: 12% 'I don't know' rate vs GPT-4.1's 20.2% over-refusal and base model's 4.3% unsafe rate.
  • Deployed at 40+ institutions; delivers 3-5x faster responses and 20-50x lower cost than GPT-4.1.

Why It Matters

Specialized, grounded AI models can outperform general-purpose giants in regulated, high-stakes domains like banking.