Research & Papers

LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News

A new benchmark reveals which LLMs can actually find fresh news, not just recall old data.

Deep Dive

Researchers have launched LiveNewsBench, a new benchmark designed to rigorously test how well LLMs can search the web for real-time information. It automatically generates fresh question-answer pairs from recent news, forcing models to search rather than rely on training data. The benchmark includes difficult, multi-hop questions and evaluates commercial and open-source models. A public leaderboard, dataset, and code are now available, addressing a key gap in evaluating agentic AI search capabilities.

Why It Matters

This benchmark is crucial for developing AI agents that can reliably answer questions about current events, moving beyond static knowledge.