Abnormalities and Disease Detection in Gastro-Intestinal Tract Images
A new PhD thesis presents a hybrid AI system that processes GI images at 41 FPS with 99% accuracy.
A new PhD thesis by researchers Zeshan Khan and Muhammad Atif Tahir, published on arXiv, introduces a multifaceted AI framework designed to tackle the complex challenge of detecting diseases in gastrointestinal (GI) tract images. The research addresses key hurdles in medical image analysis, including the high diversity of abnormalities and the computational demands of real-time clinical applications. The work systematically progresses from exploring ultra-fast traditional methods to developing optimized deep learning models, culminating in a hybrid system that balances speed and precision.
Initially, the research explored texture-based feature extraction, achieving remarkable processing speeds of over 4000 frames per second (FPS) with a 0.98 accuracy on the Kvasir V2 dataset. The study then transitioned to deep learning, where an optimized model combined with data bagging techniques improved performance on the larger HyperKvasir dataset. The final, streamlined neural network integrates texture analysis with local binary patterns and employs a learned threshold to handle inter-class similarity. This system delivers a practical balance for real-time use, processing images at 41 FPS while maintaining a high 0.99 accuracy and a 0.91 F1-score on HyperKvasir.
Furthermore, the thesis proposes two segmentation tools to enhance the system's usability, particularly in scenarios with lower frame rates. These tools leverage advanced architectures like Depth-Wise Separable Convolution and neural network ensembles to improve detection capabilities. Overall, this research provides a comprehensive and adaptable methodological pipeline—from traditional computer vision to modern deep learning and ensemble approaches—offering a significant advancement for automated, real-time GI image analysis in clinical settings.
- Hybrid AI system achieves 99% accuracy and 0.91 F1-score on the HyperKvasir dataset while running at 41 FPS for real-time analysis.
- Early texture-based methods demonstrated extreme speed, processing over 4000 FPS with 98% accuracy on the Kvasir V2 dataset.
- Proposes two novel segmentation tools using Depth-Wise Separable Convolution and neural network ensembles to improve detection in low-FPS scenarios.
Why It Matters
This work enables faster, more accurate automated screening for gastrointestinal diseases, potentially reducing diagnostic delays and improving patient outcomes.