Knowledge graphs help small LLMs but hurt large ones in zero-shot classification
Adding per-article knowledge graphs boosts small LLMs but backfires on large ones by 5x cost.
Shahana Akter and colleagues propose a zero-shot multi-label topic classification framework that operates without labeled training data. The base framework includes four variants: article-only classification, keyword-enhanced classification, and self-consistency decoding variants of both. They then augment each variant with a per-article knowledge graph extracted via a KGGen-like pipeline of subject-predicate-object triples. This yields eight methods (four base, four graph-augmented) tested across 15 large language models (LLMs) and eight multi-label datasets from different domains.
Results reveal a clear divide: keyword-enhanced classification (AK) performed best among base methods, with six of 15 LLMs surpassing a sentence-encoder baseline. However, graph augmentation had opposite effects on model sizes—it improved small models but hurt large models, indicating that larger LLMs already encode sufficient relational knowledge from pretraining. Self-consistency decoding consistently failed to boost performance while increasing computational cost roughly fivefold. The study provides practical guidance for when to invest in knowledge graph augmentation for zero-shot classification.
- Framework uses four base variants (article-only, keyword-enhanced, plus self-consistency variants) and four KG-augmented versions.
- Graph augmentation improves small LLMs but degrades large ones, as large models already possess relational knowledge.
- Self-consistency decoding adds 5x compute cost with zero performance gain across all experiments.
Why It Matters
Zero-shot classification is vital for rapid NLP deployment; this study shows when to use knowledge graphs based on model size.