DWTSumm: Discrete Wavelet Transform for Document Summarization
New method treats text as a signal, cutting hallucinations by 97%
A new research paper from George Washington University introduces DWTSumm, a framework that applies Discrete Wavelet Transform (DWT) to text embeddings for long-document summarization. The method treats text as a semantic signal, decomposing it into approximation (global structure) and detail (local specifics) components. This allows the system to produce compact summaries that retain both overall context and critical domain-specific information, either as standalone summaries or as guides for LLM generation.
Tested on clinical and legal benchmarks, DWTSumm consistently outperformed a GPT-4o baseline. It achieved over 2% improvement in BERTScore, more than 4% in Semantic Fidelity, large METEOR gains, and improved factual consistency in legal tasks. Across multiple embedding models, fidelity reached up to 97%, suggesting DWT acts as a semantic denoising mechanism that reduces hallucinations and strengthens factual grounding. The method is lightweight and generalizable, offering a practical solution for reliable summarization of long, domain-specific documents.
- DWTSumm treats text as a semantic signal, using DWT to decompose embeddings into global (approximation) and local (detail) components.
- On clinical and legal benchmarks, it improved BERTScore by 2% and Semantic Fidelity by 4% over GPT-4o baselines.
- Fidelity reached up to 97% across embedding models, indicating strong factual grounding and hallucination reduction.
Why It Matters
A lightweight method to reduce LLM hallucinations in critical domains like healthcare and law.