SG-LegalCite: New benchmark boosts legal AI with principle-aware retrieval
100,890 case-principle pairs from 8,523 Singapore judgments redefine legal citation search.
Researchers have introduced SG-LegalCite, a novel benchmark for legal citation retrieval that addresses a fundamental flaw in existing systems: they often retrieve factually similar but doctrinally irrelevant precedents. The dataset comprises 100,890 case-principle pairs extracted from 8,523 Singapore Supreme Court judgments spanning 2000 to 2025. By integrating explicit legal principles with case facts into the query, the paradigm mirrors real-world legal reasoning workflows. Tests across 11 baselines show that this principle-augmented approach provides strong discriminative signals, significantly improving retrieval accuracy.
This work is particularly consequential for Singapore's legal system, which has evolved independently. Only domestic precedents are binding; foreign authorities serve as persuasive references. SG-LegalCite ensures that models retrieve legally correct citations rather than merely factually similar ones. The benchmark is publicly available via arXiv and is expected to drive advances in legal AI, especially for jurisdictions with unique common-law traditions. The authors plan to expand the dataset to include more jurisdictions and explore integration with large language models.
- Dataset: 100,890 case-principle pairs from 8,523 Singapore Supreme Court judgments (2000–2025).
- New paradigm: Queries combine case facts with explicit legal principles, outperforming 11 baseline models.
- Critical for Singapore: Only domestic precedents are binding; foreign ones are merely persuasive.
Why It Matters
Principle-aware retrieval ensures legal AI cites doctrinally relevant precedents, not just factually similar ones.