A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification
A new architecture uses Llama 3.1 for segmentation and a fine-tuned legal model for classification.
A team of researchers has published a novel AI architecture designed to tackle the tedious and error-prone task of analyzing Non-Disclosure Agreements (NDAs). The system, detailed in a paper for STIL @ BRACIS 2025, addresses the core challenge of NDAs: their significant variation in format, structure, and legal phrasing. To automate analysis, the team built a two-stage pipeline that first intelligently breaks a document into its constituent clauses and then identifies what type of clause each one is.
The first stage employs Meta's open-source Llama-3.1-8B-Instruct model to perform document segmentation, essentially extracting individual clauses from the full NDA text. This step achieved an impressive ROUGE F1 score of 0.95, indicating high accuracy in identifying clause boundaries. The second stage uses a specialized transformer model, a fine-tuned version of Legal-Roberta-Large, to classify each extracted clause into predefined categories (like confidentiality, term, or liability). This classification stage achieved a weighted F1 score of 0.85, demonstrating reliable precision in understanding legal content.
By combining a general-purpose LLM for structural understanding with a domain-specific model for legal nuance, the architecture provides a robust solution. It demonstrates how hybrid AI approaches can be effectively applied to complex, real-world document processing tasks where format consistency cannot be assumed.
- Uses Llama-3.1-8B-Instruct for document segmentation, achieving a ROUGE F1 score of 0.95.
- Classifies clauses with a fine-tuned Legal-Roberta-Large model, scoring a weighted F1 of 0.85.
- Automates analysis of highly variable NDAs, a task traditionally slow and prone to human error.
Why It Matters
This AI pipeline can slash the time and cost of legal document review, making due diligence faster and more scalable for businesses.