New hybrid AI framework boosts legal case retrieval beyond baselines
A two-stage system segments judgments and combines BM25 with dense vectors for better analogical precedent finding.
Rajith Arulanandam and Nisasa de Silva have introduced a novel section-weighted hybrid framework for legal case retrieval, designed to capture deeper legal reasoning beyond surface word overlap. Their two-stage system first uses a deterministic large language model (LLM) offline to segment raw legal judgments into four distinct sections: facts, issues, decision, and reasoning.
In Stage 1, the system performs parallel lexical (BM25) and semantic (dense ANN) whole-document searches, then combines results via Reciprocal Rank Fusion (RRF) to create a high-recall candidate pool. Stage 2 refines this pool with fine-grained, like-for-like comparisons—matching query reasoning against candidate reasoning, for instance. To handle the scale mismatch between unbounded lexical scores and cosine similarities, the authors apply query-wise Z-score normalization before aggregating signals with learned section weights.
For top results, the system returns the relevant section text, a concise grounded rationale, and party-stance labels. Evaluated on a jurisdiction-scale benchmark, the approach consistently outperforms strong lexical (BM25) and neural (dense ANN) baselines while maintaining high candidate coverage. The paper is 10 pages with 4 figures and has been accepted to the International Conference on Natural Language Processing (ICNLP 2026).
- Uses a deterministic LLM to segment legal judgments into facts, issues, decision, and reasoning for more precise retrieval.
- Combines BM25 and dense ANN search via Reciprocal Rank Fusion in Stage 1, then refines with like-for-like section comparisons.
- Applies query-wise Z-score normalization before aggregating section-weighted similarity signals, improving accuracy over baselines.
Why It Matters
Smarter legal precedent search reduces research time and improves case outcome predictions for professionals.