ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference
New adaptive routing system achieves 98.2% accuracy on MATH benchmark while dramatically cutting compute costs.
A research team led by Siyuan Ma has introduced ODAR-Expert, a groundbreaking framework that fundamentally rethinks how large language models allocate computational resources during reasoning tasks. The system addresses a critical inefficiency in current approaches: most LLM reasoning methods rely on uniform brute-force sampling (like fixed best-of-N or self-consistency) that wastes computation on easy problems while potentially under-investing in difficult ones. ODAR-Expert instead implements principled adaptive routing, using a difficulty estimator grounded in amortized active inference to dynamically decide whether to send a query to a heuristic Fast Agent or a more deliberative Slow Agent. This represents a paradigm shift from simply scaling test-time compute to optimizing resource allocation based on problem complexity.
The technical innovation extends beyond routing to answer selection, where ODAR-Expert introduces a free-energy-principled, risk-sensitive fusion mechanism. Rather than using ad hoc voting over heterogeneous candidates, the system selects answers by minimizing a variational free energy objective that balances log-likelihood with epistemic uncertainty (varentropy). Extensive evaluation across 23 benchmarks demonstrates remarkable performance, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam (HLE), while consistently improving the compute-accuracy frontier. Crucially, the researchers validated reproducibility on a fully open-source stack (Llama 4 + DeepSeek), where ODAR surpassed homogeneous sampling strategies while reducing computational costs by 82%. This suggests that thinking-optimal scaling requires adaptive resource allocation with free-energy-based decision-making rather than simply increasing test-time compute.
- Achieves 98.2% accuracy on MATH benchmark and 54.8% on Humanity's Last Exam
- Reduces computational costs by 82% compared to uniform sampling methods
- Uses free-energy-principled fusion mechanism instead of ad hoc voting for answer selection
Why It Matters
Enables dramatically more efficient use of expensive LLMs for complex reasoning tasks, potentially reducing API costs by over 80%.