Research & Papers

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

New agentic framework mimics radiologists' iterative process, achieving 95% success rate in autonomous tool invocation.

Deep Dive

A research team from ShanghaiTech University and the Shanghai AI Lab has introduced CT-Flow, a novel AI framework that fundamentally changes how Large Vision-Language Models (LVLMs) interpret 3D medical scans like CTs. Unlike current models that perform a single, static analysis, CT-Flow uses the Model Context Protocol (MCP) to act as an 'orchestrator,' dynamically planning and executing sequences of specialized tools—such as segmentation, measurement, and radiomics—in response to complex natural language queries from clinicians. This shift from closed-box inference to an open, tool-aware paradigm more closely mimics the iterative, tool-mediated workflow of human radiologists, bridging a critical gap between AI research and clinical practice.

The team also curated CT-FlowBench, the first large-scale benchmark for training and evaluating 3D CT tool-use and multi-step reasoning. Experimental results show CT-Flow achieves state-of-the-art performance, surpassing baseline models by 41% in diagnostic accuracy on standard visual question-answering tasks and demonstrating a 95% success rate in autonomously invoking the correct sequence of tools. This work provides a scalable foundation for integrating autonomous, agentic intelligence into radiology, promising to assist with complex diagnostic workflows by decomposing high-level instructions into precise, automated actions. The paper is submitted for ACL 2026, signaling a major step toward practical, workflow-integrated AI in medicine.

Key Points
  • Uses Model Context Protocol (MCP) to orchestrate tool sequences like segmentation and measurement, achieving a 95% tool invocation success rate.
  • Introduces CT-FlowBench, the first large-scale instruction-tuning benchmark for 3D CT tool-use and multi-step clinical reasoning.
  • Outperforms standard single-inference models by 41% in diagnostic accuracy on 3D VQA tasks by mimicking radiologists' dynamic workflow.

Why It Matters

Moves medical AI from passive analysis to active clinical assistance, potentially reducing diagnostic errors and streamlining radiologists' complex workflows.