MUDY: Multi-Granular Dynamic Candidate Contextualization for Unsupervised Keyphrase Extraction
Researchers combine prompt scoring with self-attention to capture local context.
A new paper from researchers Hyeongu Kang and Susik Yoon presents MUDY (Multi-Granular Dynamic Candidate Contextualization), a framework designed to improve unsupervised keyphrase extraction. Traditional methods using pre-trained language models (PLMs) often focus on global semantic relevance but miss the local importance of keyphrases tied to specific subtopics. MUDY addresses this with two complementary components: a prompt-based scoring mechanism that estimates generation likelihood per candidate, enhanced with candidate-aware weighting to reflect local context, and a self-attention-based scoring system that leverages multi-granular attention patterns from PLMs to evaluate significance at both the document-wide and segment-specific levels. Evaluated on four real-world datasets, MUDY consistently outperforms existing state-of-the-art baselines across various top-k cutoff thresholds, demonstrating robust accuracy improvements.
The paper, accepted at SIGIR 2026, includes in-depth quantitative and qualitative analyses confirming the efficacy of its context-centric approach. By capturing both local and global saliency, MUDY enables more precise extraction of keyphrases that truly represent a document's content, even when topics shift across sections. The source code is publicly available for reproducibility, making it a practical tool for researchers and practitioners in information retrieval, text summarization, and content indexing.
- MUDY introduces prompt-based scoring with candidate-aware weighting to capture local contextual importance.
- Self-attention scoring evaluates keyphrase significance at both document-wide and segment-specific granularity.
- Outperforms state-of-the-art baselines on four real-world datasets across multiple top-k cutoff thresholds.
Why It Matters
Better keyphrase extraction means more accurate document indexing, summarization, and search for professionals handling large text corpora.