Learning Under Extreme Data Scarcity: Subject-Level Evaluation of Lightweight CNNs for fMRI-Based Prodromal Parkinsons Detection
A 40-subject Parkinson's study shows common AI testing methods produce near-perfect but meaningless results.
A new methodological study by researcher Naimur Rahman exposes a critical flaw in how AI models are typically evaluated for medical imaging tasks with limited data. The paper, titled 'Learning Under Extreme Data Scarcity: Subject-Level Evaluation of Lightweight CNNs for fMRI-Based Prodromal Parkinson's Detection,' demonstrates that commonly used evaluation practices in neuroimaging AI research can produce dangerously misleading results. Using fMRI data from just 40 subjects (20 with prodromal Parkinson's, 20 healthy controls), Rahman shows that when researchers use image-level data splits—where slices from the same subject can appear in both training and test sets—models achieve near-perfect accuracy (over 95%) through information leakage rather than genuine learning. This creates a false sense of model capability that wouldn't translate to real clinical use.
When Rahman enforced strict subject-level evaluation—where all data from a given subject stays entirely within either training or testing—performance dropped dramatically to 60-81% accuracy. The study compared several convolutional neural network architectures including VGG19, Inception V3, Inception ResNet V2, and MobileNet V1. Surprisingly, the lightweight MobileNet V1 with significantly fewer parameters demonstrated the most reliable generalization, outperforming deeper architectures in this extreme low-data regime. The research indicates that evaluation methodology and appropriate model capacity matter more than architectural complexity when working with scarce medical data. While limited to a single 40-subject cohort without external validation, this case study provides concrete recommendations for more rigorous AI evaluation in healthcare applications.
- Standard image-level data splits caused 95%+ accuracy through information leakage, dropping to 60-81% with proper subject-level evaluation
- Lightweight MobileNet V1 outperformed deeper models like VGG19 and Inception V3 despite having far fewer parameters
- The study used fMRI data from just 40 subjects (20 prodromal Parkinson's cases, 20 controls) from the PPMI cohort
Why It Matters
This exposes fundamental flaws in AI medical research methodology that could lead to dangerously overconfident clinical tools.