LARAG: Link-Aware RAG Boosts Accuracy with Fewer Tokens
New retrieval method leverages hyperlinks in docs for smarter, cheaper RAG.
Retrieval-Augmented Generation (RAG) systems typically treat technical documents as flat passages, ignoring the hyperlink structure that humans rely on for navigation. A new paper from researchers at Rulex Platform introduces LARAG (Link-Aware RAG), a lightweight strategy that leverages existing HTML hyperlinks as metadata in chunk representations. Instead of building explicit graphs, LARAG encodes link relations directly, enabling an implicit graph-like retrieval that locally prioritizes contextually connected content.
Tested on 20 expert-designed queries with four prompting strategies, LARAG consistently outperformed standard embedding-based RAG, achieving the highest BERTScore F1 while retrieving fewer chunks and generating fewer tokens. This means more accurate answers at lower computational cost. The approach is particularly suited for highly linked technical documentation, offering a practical path to better grounding without additional infrastructure or complex preprocessing.
- LARAG encodes hyperlink relations as metadata in chunk representations for implicit graph retrieval.
- Achieved highest BERTScore F1 on 20 expert queries over Rulex Platform technical documentation.
- Reduced number of retrieved chunks and generated tokens compared to baseline RAG.
Why It Matters
Makes RAG more efficient and accurate for structured documentation without complex graph preprocessing.