Research & Papers

A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

New architecture translates natural language into search queries for criminal investigations.

Deep Dive

Researchers Benjamin Puhani, Kai Brehmer, and Malte Prieß have designed a cloud-native microservice architecture for integrating Large Language Models into investigative searches using OpenSearch. The system, presented at the CLOUD COMPUTING 2026 conference, aims to bridge the semantic gap between natural-language investigative intent and technical search logic. It operates in private-cloud deployments, ensuring high security and scalability for handling large volumes of unstructured evidence in criminal investigations.

The architecture employs a human-in-control workflow where LLMs translate plain-English queries into syntactically valid OpenSearch Domain-Specific Language expressions. A hybrid retrieval strategy combines BM25-based lexical search with nested semantic vector embeddings for more accurate results. The team validated a functional prototype using the Enron Email Dataset as a structural proxy for restricted investigative corpora, establishing a baseline for future empirical evaluation. This approach could significantly reduce the time investigators spend manually searching evidence, while maintaining human oversight over AI-generated queries.

Key Points
  • System translates natural-language queries into OpenSearch DSL via LLMs
  • Combines BM25 lexical search with semantic vector embeddings for hybrid retrieval
  • Prototype validated using Enron Email Dataset as proxy for investigative data

Why It Matters

Speeds up criminal investigations by letting analysts query evidence in plain English, reducing technical barriers.