Research & Papers

Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

New architecture combines dual retrieval, adaptive routing, and CRAG to cut hallucinations and latency.

Deep Dive

Researcher Weixi Lin has published a paper detailing Higress-RAG, a novel, enterprise-centric framework designed to overcome the major bottlenecks preventing Retrieval-Augmented Generation (RAG) systems from moving from proof-of-concept to production. The framework addresses three persistent challenges: low retrieval precision for complex queries, high hallucination rates in generation, and unacceptable latency for real-time applications. It proposes a 'Full-Link Optimization' strategy built upon the Model Context Protocol (MCP), orchestrating a sophisticated pipeline that includes adaptive routing, semantic caching, hybrid retrieval, and Corrective RAG (CRAG).

The technical implementation introduces key innovations like the Higress-Native Splitter for structure-aware data ingestion and applies Reciprocal Rank Fusion (RRF) to merge signals from both dense and sparse retrieval methods. A standout feature is its 50ms-latency Semantic Caching mechanism with dynamic thresholding, which is critical for real-time performance. Experimental evaluations on Higress's own technical documentation demonstrate the system's robustness. By optimizing the entire retrieval lifecycle—from pre-retrieval query rewriting to post-retrieval corrective evaluation—Higress-RAG presents a scalable, production-ready architecture aimed at making enterprise AI deployments more reliable and efficient.

Key Points
  • Architecture built on Model Context Protocol (MCP) with a 'Full-Link Optimization' strategy for end-to-end performance.
  • Features 50ms-latency Semantic Caching and dual hybrid retrieval using Reciprocal Rank Fusion (RRF).
  • Integrates Corrective RAG (CRAG) to actively reduce hallucination rates in the final generation phase.

Why It Matters

Provides a blueprint for moving enterprise RAG systems from fragile prototypes to scalable, low-latency production deployments.