Research & Papers

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

Small legal AI model outperforms GPT-4 class models with fewer hallucinations

Deep Dive

A new study from researchers Nicole Lincoln, Nick Whitehouse, Jaron Mar, and Rivindu Perera evaluates whether a domain-trained small language model can outperform frontier LLMs on structured contract extraction at radically lower cost. They tested Olava Extract, a self-hosted legal domain Mixture-of-Experts model, against five frontier models including GPT-4 class systems. Olava Extract achieved the strongest aggregate performance with a macro F1 of 0.812 and micro F1 of 0.842, while slashing inference costs by 78% to 97%. Critically, it also produced the fewest hallucinated or unsupported extractions, a major advantage in legal workflows where errors create operational risk and downstream review burden.

The findings challenge the assumption that commercially valuable enterprise AI requires ever-larger models, massive infrastructure, and centrally hosted providers. Olava Extract's success demonstrates that high-performing, human-comparable legal AI no longer depends on the largest externally hosted models. For enterprises handling sensitive legal documents, this opens the door to self-hosted solutions that deliver superior accuracy at a fraction of the cost, with dramatically lower risk of hallucinations. The paper suggests a broader shift: domain-specific small models, trained on carefully curated data, can surpass general-purpose frontier models in specialized tasks while being far more economical and secure.

Key Points
  • Olava Extract (legal domain MoE SLM) achieved macro F1 0.812 and micro F1 0.842, outperforming all five frontier LLMs tested.
  • Inference costs reduced by 78% to 97% compared to frontier models, enabling self-hosted deployment.
  • Highest precision with fewer hallucinations critical for legal workflows where errors create operational risk.

Why It Matters

Proves domain-specific small models can deliver superior enterprise AI at radically lower cost and risk.