Dual-tier LLM (9B for speed, 27B for deep reasoning) routes queries via additive complexity scorer?

Dual-tier LLM (9B for speed, 27B for deep reasoning) routes queries via additive complexity scorer.

Trained on 266,854 real and synthetic oncological cases in ~50 minutes on AMD MI300X using QLoRA and Unsloth, achieving 56x throughput over API-based generation?

Trained on 266,854 real and synthetic oncological cases in ~50 minutes on AMD MI300X using QLoRA and Unsloth, achieving 56x throughput over API-based generation.

Four-stage Corrective RAG pipeline over 70+ guidelines with 100% document grading success; three-layer reflexion safety enforces zero-PHI policy?

Four-stage Corrective RAG pipeline over 70+ guidelines with 100% document grading success; three-layer reflexion safety enforces zero-PHI policy.

Open Source

OncoAgent: Open-source dual-tier AI for privacy-preserving oncology decisions

Hugging Face Blog May 10, 2026

⚡Fine-tuned on 266K cases in 50 minutes on AMD hardware—no cloud needed.

Deep Dive

OncoAgent combines a dual-tier fine-tuned LLM architecture with a multi-agent LangGraph topology. The system routes clinical queries through an additive complexity scorer to either a 9B parameter speed-optimized model (Tier 1) or a 27B deep-reasoning model (Tier 2). Both models were fine-tuned via QLoRA on a corpus of 266,854 real and synthetically generated oncological cases using the Unsloth framework on AMD Instinct MI300X hardware (192 GB HBM3). Sequence packing on MI300X enabled full-dataset fine-tuning in approximately 50 minutes — a 56× throughput acceleration over API-based generation. The system also implements a four-stage Corrective RAG pipeline over 70+ physician-grade NCCN and ESMO guidelines.

Safety is enforced via a three-layer reflexion safety validator that maintains a strict Zero-PHI policy—no protected health information ever leaves the hospital network. Post-fix, CRAG document grading achieved a 100% success rate with a mean RAG confidence score of 2.3+. The complete system is 100% open source and deployable on-premises, eliminating proprietary cloud API dependency and preserving patient data sovereignty. By decomposing clinical reasoning across eight specialized LangGraph nodes, OncoAgent ensures auditable, hallucination-grounded recommendations suitable for high-stakes oncology triage and treatment pathway decisions.

Key Points

Dual-tier LLM (9B for speed, 27B for deep reasoning) routes queries via additive complexity scorer.
Trained on 266,854 real and synthetic oncological cases in ~50 minutes on AMD MI300X using QLoRA and Unsloth, achieving 56x throughput over API-based generation.
Four-stage Corrective RAG pipeline over 70+ guidelines with 100% document grading success; three-layer reflexion safety enforces zero-PHI policy.

Why It Matters

Enables hospitals to deploy safe, privacy-compliant oncology AI without cloud dependency, reducing hallucination risks.

Read Original Article

OncoAgent: Open-source dual-tier AI for privacy-preserving oncology decisions

Why It Matters

Related Articles

🚀 Stay Ahead in AI