AI Safety

Secure On-Premise Deployment of Open-Weights Large Language Models in Radiology: An Isolation-First Architecture with Prospective Pilot Evaluation

Secure on-premise LLM deployment passes regulatory hurdles for patient data processing

Deep Dive

Researchers at a German university hospital with over 10,000 employees have successfully deployed an on-premise, open-weights large language model (LLM) for radiology, overcoming stringent regulatory barriers. The system, detailed in a new arXiv paper, uses an isolation-first architecture with containerized inference, strict network segmentation, host-enforced egress filtering, and active isolation monitoring to prevent unauthorized external connectivity. This setup allowed the hospital to process unanonymized protected health information (PHI) after securing approval from clinic management, compliance, data protection, and information security officers. The deployment package includes automated isolation and hardening tests and is publicly available.

During a one-week pilot phase, 22 residents and radiologists used the system, which ran the open-weights DeepSeek-R1 model via vLLM, with 10 predefined prompt templates for tasks like report corrections, simplifications, and radiology guideline recommendations. Text-anchored tasks received the highest utility ratings on a 0-10 Likert scale, while open-ended conclusion generation based on findings resulted in the highest frequency of critical errors, such as clinically relevant hallucinations or omissions. The system was rated stable and user-friendly, marking a significant step toward using open-weights LLMs as an official clinical service in a European hospital setting.

Key Points
  • Isolation-first architecture uses strict network segmentation and egress filtering to process unanonymized PHI legally
  • DeepSeek-R1 model served via vLLM in a containerized stack with automated hardening tests
  • Pilot with 22 radiologists found text-anchored tasks (report corrections, guideline recommendations) most useful, while open-ended generation caused critical errors

Why It Matters

Demonstrates a regulatory-approved pathway for deploying open-weights LLMs in clinical settings without cloud dependencies.