Research & Papers

Health System Scale Semantic Search Across Unstructured Clinical Notes

arXiv cs.IR April 29, 2026

⚡Sub-second search across 166 million clinical notes reduces chart review time by up to 89%

Deep Dive

Researchers from a major children's hospital have published a paper demonstrating the first health-system-scale deployment of semantic search across unstructured clinical notes. The system indexes 166 million notes—represented as 484 million vector embeddings—from 1.68 million patients. It uses instruction-tuned qwen3-embedding-0.6B embeddings with a 300-token chunking strategy, storing vectors in a managed database with storage-optimized indexing and maintaining full-text metadata in a low-latency key-value store. The entire pipeline operates within a HIPAA-compliant governance framework.

Performance benchmarks are impressive: the system delivers sub-second query latency (median 237 ms for single-user, 451 ms for 20-user concurrency) at a monthly cost of roughly $4,000. On a physician-authored clinical question-answering benchmark, the Qwen3 embeddings with 300-token chunks achieved 94.6% accuracy. In real-world clinical utility evaluation across three chart abstraction tasks, semantic search reduced time-to-completion by 24% to 89% compared to manual clinician chart review, while maintaining comparable inter-rater agreement. The authors conclude that health-system-scale semantic search is both technically and operationally feasible, enabling interactive search, cohort generation, and downstream LLM-powered clinical applications without requiring specialized informatics expertise.

Key Points

System indexes 166M clinical notes (484M vectors) from 1.68M patients using qwen3-embedding-0.6B
94.6% accuracy on clinical QA benchmark with 300-token chunks; sub-second latency (237 ms single-user)
Reduced chart abstraction time by 24–89% across three tasks with comparable inter-rater agreement

Why It Matters

Proves semantic search over massive clinical datasets is feasible, cutting weeks of manual chart review to minutes.

Read Original Article

Health System Scale Semantic Search Across Unstructured Clinical Notes

Why It Matters

Stay Ahead in AI