A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings
New architecture translates natural language into search queries for criminal investigations.
Researchers Benjamin Puhani, Kai Brehmer, and Malte Prieß have designed a cloud-native microservice architecture for integrating Large Language Models into investigative searches using OpenSearch. The system, presented at the CLOUD COMPUTING 2026 conference, aims to bridge the semantic gap between natural-language investigative intent and technical search logic. It operates in private-cloud deployments, ensuring high security and scalability for handling large volumes of unstructured evidence in criminal investigations.
The architecture employs a human-in-control workflow where LLMs translate plain-English queries into syntactically valid OpenSearch Domain-Specific Language expressions. A hybrid retrieval strategy combines BM25-based lexical search with nested semantic vector embeddings for more accurate results. The team validated a functional prototype using the Enron Email Dataset as a structural proxy for restricted investigative corpora, establishing a baseline for future empirical evaluation. This approach could significantly reduce the time investigators spend manually searching evidence, while maintaining human oversight over AI-generated queries.
- System translates natural-language queries into OpenSearch DSL via LLMs
- Combines BM25 lexical search with semantic vector embeddings for hybrid retrieval
- Prototype validated using Enron Email Dataset as proxy for investigative data
Why It Matters
Speeds up criminal investigations by letting analysts query evidence in plain English, reducing technical barriers.