Research & Papers

A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

arXiv cs.DC April 24, 2026

⚡New architecture translates natural language into search queries for criminal investigations.

Deep Dive

Researchers Benjamin Puhani, Kai Brehmer, and Malte Prieß have designed a cloud-native microservice architecture for integrating Large Language Models into investigative searches using OpenSearch. The system, presented at the CLOUD COMPUTING 2026 conference, aims to bridge the semantic gap between natural-language investigative intent and technical search logic. It operates in private-cloud deployments, ensuring high security and scalability for handling large volumes of unstructured evidence in criminal investigations.

The architecture employs a human-in-control workflow where LLMs translate plain-English queries into syntactically valid OpenSearch Domain-Specific Language expressions. A hybrid retrieval strategy combines BM25-based lexical search with nested semantic vector embeddings for more accurate results. The team validated a functional prototype using the Enron Email Dataset as a structural proxy for restricted investigative corpora, establishing a baseline for future empirical evaluation. This approach could significantly reduce the time investigators spend manually searching evidence, while maintaining human oversight over AI-generated queries.

Key Points

System translates natural-language queries into OpenSearch DSL via LLMs
Combines BM25 lexical search with semantic vector embeddings for hybrid retrieval
Prototype validated using Enron Email Dataset as proxy for investigative data

Why It Matters

Speeds up criminal investigations by letting analysts query evidence in plain English, reducing technical barriers.

Read Original Article

A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

Why It Matters

Stay Ahead in AI