Research & Papers

Methods for Knowledge Graph Construction from Text Collections: Development and Applications

New research combines NLP, ML, and Generative AI to build structured knowledge from massive text collections.

Deep Dive

Researcher Vanni Zavarella has published a comprehensive PhD thesis (arXiv:2603.25862) detailing advanced methods for the automatic construction of Knowledge Graphs (KGs) from massive collections of unstructured text. The work addresses the critical challenge of extracting actionable, semantically rich knowledge from the exploding volume of textual data across domains like news, social media, scholarly publications, and healthcare records. By integrating Natural Language Processing (NLP), Machine Learning (ML), and Generative AI with Semantic Web best practices, the thesis provides scalable and flexible frameworks adaptable to different text genres and schema requirements.

The research is grounded in three concrete, large-scale applications. First, it analyzes the discourse around Digital Transformation within global news and social media platforms. Second, it maps and identifies trends in recent research from a vast corpus of publications in the Architecture, Engineering, Construction, and Operations (AECO) domain. Third, and perhaps most impactful, it generates graphs of causal relations between biomedical entities extracted from electronic health records and patient-authored drug reviews. The contributions include benchmark evaluation results, the design of customized algorithms, and the creation of publicly available KG data resources that are semantically transparent and interoperable by design.

Key Points
  • Combines NLP, ML, and Generative AI with Semantic Web techniques to automate Knowledge Graph construction from text.
  • Validated across three major domains: digital transformation media analysis, AECO research mapping, and biomedical causal graph generation.
  • Delivers benchmark results, custom algorithms, and publicly available KG data resources for the research community.

Why It Matters

Provides a scalable blueprint for turning massive, messy text data into structured, explainable, and actionable knowledge for enterprises and research.