Research & Papers

RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers

A new AI system classifies 17 gastrointestinal conditions from pill-cam videos with transformer vision models.

Deep Dive

A team of researchers has published a new paper on arXiv detailing an AI system that uses Vision Transformers (ViT) to automatically detect rare diseases from capsule endoscopic videos. The work corresponds to the Gastro Competition for multi-label classification, where the model is fine-tuned to identify 17 specific labels across the gastrointestinal tract. These labels include anatomical landmarks like the z-line and pylorus, as well as critical pathologies such as active bleeding, angiectasia, erosion, and ulcers. The base model is Google's Vision Transformer (ViT) configured with a batch size of 16 and processing 224x224 resolution images, showcasing the application of cutting-edge transformer architecture to a complex medical imaging task.

The technical performance, evaluated on a test set of three videos, resulted in an overall mean Average Precision (mAP) of 0.0205 at an Intersection over Union (IoU) threshold of 0.5, and 0.0196 at the stricter IoU@0.95. While these mAP scores appear low, they represent a benchmark for a highly challenging, multi-label classification task in an unstructured video environment with significant class imbalance. The research demonstrates a proof-of-concept for automating the analysis of lengthy capsule endoscopy recordings, which could assist clinicians by flagging potential areas of concern across hours of video footage. This approach aims to reduce diagnostic oversight and improve the efficiency of screening for rare gastrointestinal conditions.

Key Points
  • Uses Google's Vision Transformer (ViT) to classify 17 GI landmarks and diseases from pill-cam video frames at 224x224 resolution.
  • Achieved a mean Average Precision (mAP) of 0.0205 @0.5 IoU on a test dataset of three endoscopic videos.
  • Automates detection of critical conditions like active bleeding, ulcers, and polyps, aiming to assist in lengthy video review.

Why It Matters

Automates screening of hours of capsule endoscopy footage, helping clinicians spot rare GI diseases faster and reducing diagnostic fatigue.