Research & Papers

DWTSumm: Discrete Wavelet Transform for Document Summarization

New method treats text as a signal, cutting hallucinations by 97%

Deep Dive

A new research paper from George Washington University introduces DWTSumm, a framework that applies Discrete Wavelet Transform (DWT) to text embeddings for long-document summarization. The method treats text as a semantic signal, decomposing it into approximation (global structure) and detail (local specifics) components. This allows the system to produce compact summaries that retain both overall context and critical domain-specific information, either as standalone summaries or as guides for LLM generation.

Tested on clinical and legal benchmarks, DWTSumm consistently outperformed a GPT-4o baseline. It achieved over 2% improvement in BERTScore, more than 4% in Semantic Fidelity, large METEOR gains, and improved factual consistency in legal tasks. Across multiple embedding models, fidelity reached up to 97%, suggesting DWT acts as a semantic denoising mechanism that reduces hallucinations and strengthens factual grounding. The method is lightweight and generalizable, offering a practical solution for reliable summarization of long, domain-specific documents.

Key Points
  • DWTSumm treats text as a semantic signal, using DWT to decompose embeddings into global (approximation) and local (detail) components.
  • On clinical and legal benchmarks, it improved BERTScore by 2% and Semantic Fidelity by 4% over GPT-4o baselines.
  • Fidelity reached up to 97% across embedding models, indicating strong factual grounding and hallucination reduction.

Why It Matters

A lightweight method to reduce LLM hallucinations in critical domains like healthcare and law.