Open Source

Lessons from deploying RAG bots for regulated industries

A RAG system deployed across construction and mining sites shows query expansion beats chunk size obsession.

Deep Dive

A developer has shared hard-won lessons from deploying a Retrieval-Augmented Generation (RAG) AI assistant for regulated industries in Australia, including construction sites, aged care facilities, and mining operations. The system, powered by Anthropic's Claude Haiku and ChromaDB, was built to answer complex compliance questions. The key insight was that query expansion—generating four alternative phrasings of a user's query—mattered more for retrieval quality than the commonly obsessed-over chunk size. This was especially critical for domain-specific jargon where phrasing varied. The architecture also forced-inclusion of documents based on title matches, ensuring policies like 'FIFO' were always retrieved when mentioned.

Security and operational design proved equally vital. The team implemented a three-layer prompt system with immutable core safety rules, swappable industry personalities, and additive client instructions, preventing 'ignore previous instructions' attacks. Technically, they found local sentence-transformers embeddings (all-MiniLM-L6-v2) performed close enough to OpenAI's ada-002 for significant cost and latency savings. Operationally, they abandoned shared infrastructure for isolated $6/month DigitalOcean droplets per client, eliminating cross-contamination risks and simplifying management. The entire RAG engine is now open-sourced on GitHub for community review and adaptation.

Key Points
  • Query expansion via Claude Haiku generating 4 phrasings boosted retrieval more than tuning chunk size.
  • A three-layer prompt system with immutable safety rules prevented client jailbreaks and 'ignore instructions' attacks.
  • Using local sentence-transformers embeddings and $6/month isolated VMs per client cut costs and improved security.

Why It Matters

Provides a proven, open-source blueprint for building secure, cost-effective enterprise RAG systems in high-stakes regulated environments.