Research & Papers

Argus-Retriever achieves top visual document retrieval with query-aware embeddings

New open model tops leaderboards using only 9% of training data compared to rivals

Deep Dive

Argus-Retriever, introduced by a team of researchers, represents a paradigm shift in visual document retrieval by making document embeddings query-dependent. Traditional late-interaction models like ColPali and ColQwen produce identical page representations regardless of the query, limiting precision when searching across diverse document types such as tables, charts, and layout-heavy evidence requests. Argus addresses this by adding a region-aware Mixture-of-Experts (MoE) module on top of Qwen3.5-VL: the query encoder generates both retrieval embeddings and a compact context vector, while the document page is pooled into spatial regions. A query-aware router then selects latent experts per region before performing MaxSim scoring, producing a multi-vector index that remains compatible with ColPali-style retrieval but now varies with the query.

Despite using a compact 1024-dimensional retrieval head (versus 2560 or 4096 in recent state-of-the-art systems) and training on roughly 9% of available public supervision, the 9B model delivers state-of-the-art results: 92.67 NDCG@5 on ViDoRe V1 and 86.0 on the combined V1+V2 leaderboard. When embedded in a Qwen3.6-27B agentic retrieval pipeline on ViDoRe V3, Argus-9B improves its NDCG@10 from 60.28 to 64.80, demonstrating its dual role as a strong standalone retriever and an effective search primitive for iterative LLM agents. This efficiency suggests that query-conditioned late interaction can achieve higher accuracy with significantly less data and smaller embedding sizes, potentially lowering deployment costs.

Key Points
  • Argus-9B achieves 86.0 NDCG@5 on the combined ViDoRe V1+V2 leaderboard, the highest for any open late-interaction model.
  • Uses a query-conditioned region-aware MoE to produce document embeddings that adapt per query, unlike fixed-embedding systems.
  • Trained on only 9% of public supervision data and uses a 1024-dimensional head, yet outperforms larger models with 2560-4096 dimensions.

Why It Matters

Query-adaptive retrieval boosts accuracy for complex document searches while cutting training data and model size, lowering barriers for enterprise deployment.