Cloud-native LLM system helps investigators query evidence with plain English
New architecture translates natural language into search queries for criminal investigations.
Researchers Benjamin Puhani, Kai Brehmer, and Malte PrieΓ have designed a cloud-native microservice architecture for integrating Large Language Models into investigative searches using OpenSearch. The system, presented at the CLOUD COMPUTING 2026 conference, aims to bridge the semantic gap between natural-language investigative intent and technical search logic. It operates in private-cloud deployments, ensuring high security and scalability for handling large volumes of unstructured evidence in criminal investigations.
The architecture employs a human-in-control workflow where LLMs translate plain-English queries into syntactically valid OpenSearch Domain-Specific Language expressions. A hybrid retrieval strategy combines BM25-based lexical search with nested semantic vector embeddings for more accurate results. The team validated a functional prototype using the Enron Email Dataset as a structural proxy for restricted investigative corpora, establishing a baseline for future empirical evaluation. This approach could significantly reduce the time investigators spend manually searching evidence, while maintaining human oversight over AI-generated queries.
- System translates natural-language queries into OpenSearch DSL via LLMs
- Combines BM25 lexical search with semantic vector embeddings for hybrid retrieval
- Prototype validated using Enron Email Dataset as proxy for investigative data
Why It Matters
Speeds up criminal investigations by letting analysts query evidence in plain English, reducing technical barriers.