Research & Papers

FD-RAG boosts accuracy 7.8% and cuts latency 8.4x for edge AI

New federated RAG system runs on fragmented devices without sharing raw data.

Deep Dive

Retrieval-augmented generation (RAG) typically assumes centralized knowledge and abundant compute — assumptions that break down on edge devices where data is fragmented, private, and LLM calls are expensive. Researchers from multiple institutions present FD-RAG, a federated dual-system framework that solves this by separating lightweight memory access from on-demand LLM reasoning. FD-RAG builds semantic-aware adaptive hypergraphs over local corpora and distills them into compact QA memories. During inference, it handles well-covered queries via direct memory matching — fast and cheap — and only calls the LLM when necessary, tracing retrieved memories back to hypergraph-grounded evidence. To preserve privacy, it aggregates anonymized memory representations across devices without exposing raw documents, enabling collaborative knowledge sharing.

Experiments on QA benchmarks show FD-RAG improves accuracy by up to 7.8% while reducing latency by 8.4x compared to strong local and federated baselines. The framework also includes theoretical guarantees, establishing an O(1/ε²) convergence rate for hypergraph learning. This makes decentralized, privacy-preserving RAG tractable for real edge deployments — think smartphone assistants, IoT devices, or edge servers with limited GPU access. For tech professionals, FD-RAG points to a future where AI can leverage fragmented, private knowledge without centralizing data or burning compute budgets.

Key Points
  • Decouples memory access from LLM reasoning, using adaptive hypergraphs and QA memories for fast local responses
  • Improves QA accuracy by 7.8% while cutting latency 8.4x versus existing local and federated baselines
  • Aggregates anonymized device memories without exposing raw data, enabling privacy-preserving federated knowledge sharing

Why It Matters

Enables efficient, private RAG on edge devices — key for decentralized AI assistants and IoT applications.