Research & Papers

NightFeats multi-agent RAG beats Claude and Nova at NeurIPS 2025

Won Best Dynamic Evaluation with transparent, verifiable architecture over proprietary black boxes.

Deep Dive

NightFeats, a multi-agent retrieval-augmented generation (RAG) system by Quentin Fever and Naziha Aslam, won Best Dynamic Evaluation in the text-to-text track at the NeurIPS 2025 MMU-RAGent competition. Instead of optimizing for automatic metrics, the system introduces a principled pipeline with three coordinated phases: retrieval, curation, and composition. Each phase is governed by explicit intermediate representations and handoff contracts, inspired by Agentic Context Engineering (ACE). Core innovations include temporal-semantic reranking for relevance, bounded contradiction reconciliation to resolve conflicting sources, and citation-preserving composition for verifiable outputs.

In competition results, NightFeats surpassed proprietary baselines including Claude-SonnetV2 and Nova-Pro on both LLM-as-a-Judge and Human Likert evaluations. This demonstrates that architectural transparency and evidence grounding align better with human preferences than systems narrowly optimized for similarity metrics. The work challenges the assumption that larger or closed models are inherently superior, offering a blueprint for building trustworthy, explainable RAG systems.

Key Points
  • Uses three-phase pipeline (retrieval, curation, composition) with explicit handoff contracts for verifiability
  • Introduces temporal-semantic reranking and bounded contradiction reconciliation to improve evidence quality
  • Outperformed proprietary models like Claude-SonnetV2 and Nova-Pro on human preference and LLM-as-a-Judge evaluations

Why It Matters

Proves that transparent, verifiable RAG architectures can beat proprietary black boxes on human preference.

📬 Get the top 10 AI stories daily