Research & Papers

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

arXiv cs.CL May 08, 2026

⚡Small legal AI model outperforms GPT-4 class models with fewer hallucinations

Deep Dive

A new study from researchers Nicole Lincoln, Nick Whitehouse, Jaron Mar, and Rivindu Perera evaluates whether a domain-trained small language model can outperform frontier LLMs on structured contract extraction at radically lower cost. They tested Olava Extract, a self-hosted legal domain Mixture-of-Experts model, against five frontier models including GPT-4 class systems. Olava Extract achieved the strongest aggregate performance with a macro F1 of 0.812 and micro F1 of 0.842, while slashing inference costs by 78% to 97%. Critically, it also produced the fewest hallucinated or unsupported extractions, a major advantage in legal workflows where errors create operational risk and downstream review burden.

The findings challenge the assumption that commercially valuable enterprise AI requires ever-larger models, massive infrastructure, and centrally hosted providers. Olava Extract's success demonstrates that high-performing, human-comparable legal AI no longer depends on the largest externally hosted models. For enterprises handling sensitive legal documents, this opens the door to self-hosted solutions that deliver superior accuracy at a fraction of the cost, with dramatically lower risk of hallucinations. The paper suggests a broader shift: domain-specific small models, trained on carefully curated data, can surpass general-purpose frontier models in specialized tasks while being far more economical and secure.

Key Points

Olava Extract (legal domain MoE SLM) achieved macro F1 0.812 and micro F1 0.842, outperforming all five frontier LLMs tested.
Inference costs reduced by 78% to 97% compared to frontier models, enabling self-hosted deployment.
Highest precision with fewer hallucinations critical for legal workflows where errors create operational risk.

Why It Matters

Proves domain-specific small models can deliver superior enterprise AI at radically lower cost and risk.

Read Original Article

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

Why It Matters

Stay Ahead in AI