Enriching Taxonomies Using Large Language Models
A new AI pipeline proposes and validates new nodes to fix outdated, limited knowledge structures.
Researchers Zeinab Ghamlouch and Mehwish Alam have introduced Taxoria, a novel pipeline designed to automatically enrich and update structured knowledge taxonomies using Large Language Models (LLMs). Presented at ECAI 2025, the work addresses a critical bottleneck in knowledge management: many domain-specific taxonomies suffer from limited coverage, ambiguous terms, and become quickly outdated, reducing their utility for information retrieval and AI applications. Unlike methods that extract internal knowledge from an LLM to build a taxonomy from scratch, Taxoria's key innovation is using an existing, potentially sparse taxonomy as a seed. It then leverages the broad knowledge of an LLM to propose relevant new candidate nodes for expansion, effectively bridging human-curated structure with AI-scale knowledge.
The Taxoria pipeline employs a two-stage process of generation and validation to ensure reliability. First, it prompts an LLM (like GPT-4 or Claude) to suggest new nodes, siblings, or children for existing taxonomy terms. Crucially, these AI-generated candidates then undergo a validation step to filter out hallucinations and ensure semantic relevance and coherence with the original structure. The final output is not just a list but an enriched, merged taxonomy complete with provenance tracking—showing which nodes were AI-suggested—and visualization tools for analysis. This approach provides a scalable, semi-automated method for organizations to maintain and expand their knowledge graphs, product catalogs, or scientific ontologies, keeping them current with evolving terminology and concepts without purely manual effort.
- Taxoria uses an existing taxonomy as a seed for LLM-powered expansion, unlike extraction-only methods.
- The pipeline includes a validation step to filter hallucinations, ensuring semantic relevance of new nodes.
- Output includes an enriched taxonomy with provenance tracking and visualization for expert analysis.
Why It Matters
Enables organizations to automatically keep product catalogs, knowledge graphs, and scientific ontologies current and comprehensive.