Uses three-phase pipeline (retrieval, curation, composition) with explicit handoff contracts for verifiability?

Uses three-phase pipeline (retrieval, curation, composition) with explicit handoff contracts for verifiability

Introduces temporal-semantic reranking and bounded contradiction reconciliation to improve evidence quality?

Introduces temporal-semantic reranking and bounded contradiction reconciliation to improve evidence quality

Outperformed proprietary models like Claude-SonnetV2 and Nova-Pro on human preference and LLM-as-a-Judge evaluations?

Outperformed proprietary models like Claude-SonnetV2 and Nova-Pro on human preference and LLM-as-a-Judge evaluations

Research & Papers

NightFeats multi-agent RAG beats Claude and Nova at NeurIPS 2025

arXiv cs.CL June 11, 2026

⚡Won Best Dynamic Evaluation with transparent, verifiable architecture over proprietary black boxes.

Deep Dive

NightFeats, a multi-agent retrieval-augmented generation (RAG) system by Quentin Fever and Naziha Aslam, won Best Dynamic Evaluation in the text-to-text track at the NeurIPS 2025 MMU-RAGent competition. Instead of optimizing for automatic metrics, the system introduces a principled pipeline with three coordinated phases: retrieval, curation, and composition. Each phase is governed by explicit intermediate representations and handoff contracts, inspired by Agentic Context Engineering (ACE). Core innovations include temporal-semantic reranking for relevance, bounded contradiction reconciliation to resolve conflicting sources, and citation-preserving composition for verifiable outputs.

In competition results, NightFeats surpassed proprietary baselines including Claude-SonnetV2 and Nova-Pro on both LLM-as-a-Judge and Human Likert evaluations. This demonstrates that architectural transparency and evidence grounding align better with human preferences than systems narrowly optimized for similarity metrics. The work challenges the assumption that larger or closed models are inherently superior, offering a blueprint for building trustworthy, explainable RAG systems.

Key Points

Uses three-phase pipeline (retrieval, curation, composition) with explicit handoff contracts for verifiability
Introduces temporal-semantic reranking and bounded contradiction reconciliation to improve evidence quality
Outperformed proprietary models like Claude-SonnetV2 and Nova-Pro on human preference and LLM-as-a-Judge evaluations

Why It Matters

Proves that transparent, verifiable RAG architectures can beat proprietary black boxes on human preference.

Read Original Article

NightFeats multi-agent RAG beats Claude and Nova at NeurIPS 2025

Why It Matters

Related Articles

🚀 Stay Ahead in AI