Research & Papers

SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs

Researchers built a system that uses GPT-4o and other LLMs to extract and rank insights from 7,900 sports articles.

Deep Dive

A research team led by Nitish Kumar has introduced SUMMIR (Sentence Unified Multimetric Model for Importance Ranking), a novel framework designed to extract and rank meaningful insights from sports journalism while actively detecting and filtering out AI hallucinations. The system was trained and tested on a substantial dataset of 7,900 news articles covering 800 matches across Cricket, Soccer, Basketball, and Baseball. To generate insights, the researchers employed multiple state-of-the-art LLMs including GPT-4o, Qwen2.5-72B-Instruct, Llama-3.3-70B-Instruct, and Mixtral-8x7B-Instruct-v0.1, creating a diverse pool of potential analysis points from pre-game and post-game coverage.

The key innovation lies in SUMMIR's two-layer validation approach. First, it assesses factual accuracy using a FactScore-based methodology that cross-references generated insights against source material. Second, it employs the SummaC (Summary Consistency) framework powered by GPT-4o to specifically detect hallucinations—instances where LLMs generate plausible but incorrect information. The system then ranks the validated insights based on user-specific interests, creating personalized, reliable summaries. This approach revealed significant differences in factual consistency across different LLMs, providing valuable data for future model development.

The framework represents a significant step toward reliable automated content analysis, addressing one of the most persistent challenges in LLM deployment: the tendency to generate convincing but inaccurate information. By combining multiple validation methods and focusing specifically on hallucination detection, SUMMIR offers a blueprint for building more trustworthy AI systems in journalism and content analysis domains. The researchers have made their source code publicly available, allowing other teams to build upon their work in developing hallucination-resistant AI applications.

Key Points
  • Analyzes 7,900 sports articles across Cricket, Soccer, Basketball, and Baseball using multiple LLMs including GPT-4o and Llama-3.3-70B
  • Employs FactScore and SummaC frameworks with GPT-4o to detect and filter hallucinations in generated insights
  • Ranks validated insights based on user-specific interests through the SUMMIR architecture for personalized sports analysis

Why It Matters

Provides a blueprint for building hallucination-resistant AI systems that can reliably analyze and summarize complex information from multiple sources.