Research & Papers

[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use.

New benchmark reveals intelligent routing can cut LLM costs by 60% on financial tasks without sacrificing quality.

Deep Dive

A new benchmark study reveals that intelligent model routing can deliver substantial cost savings for financial AI applications without compromising quality. The research tested two routing strategies on four financial datasets from HuggingFace's AdaptLLM/finance-tasks collection: FiQA-SA for sentiment analysis, Financial Headlines for classification, FPB for formal news sentiment, and ConvFinQA for multi-turn Q&A on 10-K filings. The baseline used Claude Opus for all tasks, while test strategies routed simple prompts to cheaper models like Claude Haiku and medium prompts to either Claude Sonnet or open-source alternatives like Qwen 3.5 27B and Gemma 3 27B.

The results showed dramatic savings across all tasks, with the intra-provider strategy (routing within Anthropic's models) achieving 58-78% reductions on three datasets and the flexible strategy (including open-source models) reaching up to 89% savings. Most notably, ConvFinQA—a complex multi-turn Q&A dataset—still showed 58% savings because the routing system correctly identified that many questions within lengthy 10-K documents were simple lookups rather than complex reasoning tasks. For example, "What was operating cash flow in 2014?" could be handled by Haiku, while "What is the implied effective tax rate adjustment across three years?" required Opus's advanced reasoning capabilities.

The study highlights the importance of task-specific routing decisions in enterprise AI deployments, particularly for financial applications where both accuracy and cost efficiency matter. While the research focused on financial verticals and noted limitations with long-form tasks like ECTSum transcripts, it demonstrates that intelligent routing systems can significantly optimize AI spending. The findings suggest that companies using multiple LLMs could implement similar routing strategies to reduce costs while maintaining performance, especially for applications with mixed-complexity tasks.

Key Points
  • Intelligent routing achieved ~60% average cost savings across four financial AI datasets
  • ConvFinQA multi-turn Q&A showed 58% savings despite complexity, as many questions were simple lookups
  • Flexible routing with open-source models (Qwen 3.5 27B) achieved up to 89% savings on sentiment tasks

Why It Matters

Enterprise AI teams can dramatically reduce costs while maintaining quality by implementing smart model routing based on task complexity.