Research & Papers

Performance Evaluation of LLMs in Automated RDF Knowledge Graph Generation

New research shows Llama outperforms Qwen and Gemma, achieving near-perfect accuracy in transforming messy logs into structured data.

Deep Dive

A team from Babes-Bolyai University has published groundbreaking research evaluating how well Large Language Models (LLMs) can automatically transform complex cloud system logs into structured RDF knowledge graphs. The study, led by Ioana Ramona Martin, Tudor Cioara, and Ionut Anghel, addresses a critical gap in cloud operations by creating the first public Log-to-KG dataset from OpenStack logs, providing a much-needed benchmark for this emerging application. Their framework uses two pipelines—one for extraction and one for validation—to systematically test different prompting strategies across multiple LLM architectures.

The results reveal that Few-Shot learning is dramatically more effective than other approaches, with Meta's Llama model achieving a remarkable 99.35% F1 score and producing 100% syntactically valid RDF triples. Other models like Qwen, NuExtract, and Gemma also performed well under Few-Shot prompting, while Chain-of-Thought approaches maintained similar accuracy levels. Surprisingly, advanced strategies like Tree-of-Thought, Self-Critique, and Generate-Multiple performed substantially worse, highlighting that for this specific task, providing contextual examples through Few-Shot prompting is more valuable than complex reasoning frameworks.

The research provides concrete guidance for DevOps and cloud engineering teams looking to automate log analysis. The findings suggest that with proper prompt design—specifically using Few-Shot examples—organizations can reliably convert heterogeneous cloud logs into structured knowledge graphs without extensive manual annotation. This enables automated root-cause analysis, cross-service reasoning, and improved security monitoring that raw logs alone cannot provide.

Key Points
  • Meta's Llama model achieved 99.35% F1 score with 100% valid RDF output using Few-Shot prompting
  • Researchers created first public Log-to-KG dataset from OpenStack logs to enable objective benchmarking
  • Few-Shot learning outperformed advanced strategies like Tree-of-Thought and Self-Critique by substantial margins

Why It Matters

Enables automated transformation of messy cloud logs into structured knowledge graphs for better troubleshooting and security analysis.