BoostTaxo: Zero-Shot Taxonomy Induction via Boosting LLM Reasoning
Lightweight LLM filters, large LLM ranks, and structural calibration boosts accuracy.
BoostTaxo, introduced by Yancheng Ling and colleagues, tackles zero-shot taxonomy induction by combining two LLMs in a boosting-style pipeline. Given a set of domain terms, a lightweight LLM first filters potential parent candidates from a concept pool. Then, a large-scale LLM ranks and scores these candidates for fine-grained parent assignment. To improve structural reliability, the framework incorporates retrieval-augmented definition refinement and constraint-aware calibration that adjusts edge weights based on the emerging hierarchy. This coarse-to-fine process avoids costly exhaustive search while preserving accuracy.
Evaluated on WordNet, DBLP, and SemEval-Sci benchmarks, BoostTaxo matches or exceeds state-of-the-art zero-shot methods. Ablation studies confirm that both the hybrid candidate selection and the structure-aware calibration significantly contribute to performance. Further analysis shows how candidate pool size impacts taxonomy quality and provides case studies of successes and failures. The framework is designed for scalable, domain-agnostic use, making it valuable for automating knowledge graph construction and ontology building from unstructured terms.
- Lightweight LLM filters candidate parents, large LLM ranks them for coarse-to-fine selection.
- Structure-aware calibration adjusts edge weights using hierarchy features, improving reliability.
- Outperforms SOTA on WordNet, DBLP, and SemEval-Sci datasets in zero-shot settings.
Why It Matters
Automates taxonomy creation from any domain terms, enabling faster knowledge graph and ontology building.