Research & Papers

EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition

A new lightweight model matches ResNet18's performance for sign language recognition while using less than half the parameters.

Deep Dive

Researchers Rishabh Gupta and Shravya R. Nalla have introduced EfficientSign, a novel lightweight neural network architecture designed specifically for recognizing Indian Sign Language (ISL) alphabets. The model builds upon the EfficientNet-B0 backbone but incorporates two key attention modules: a Squeeze-and-Excitation block for channel-wise feature recalibration and a spatial attention layer to focus computational resources on hand gesture regions. This design allows the model to achieve a remarkable 99.94% accuracy (±0.05%) on a dataset of 12,637 images across all 26 ISL alphabet classes, as validated through rigorous 5-fold cross-validation.

Crucially, EfficientSign matches the near-perfect 99.97% accuracy of the much larger ResNet18 model while using only 4.2 million parameters—a 62% reduction from ResNet18's 11.2 million. This parameter efficiency is the breakthrough, translating directly to lower computational cost and memory footprint. The researchers also demonstrated the strength of learned features by extracting 1,280-dimensional vectors from their model and feeding them into classical classifiers like SVM, which achieved 99.63% accuracy, far surpassing the 92% benchmark set by older SURF-based methods in 2015.

The work signals a shift from resource-intensive models and manual feature engineering to efficient, attention-driven architectures that are practical for deployment. By proving that high accuracy doesn't require massive scale, EfficientSign paves the way for real-time, on-device ISL recognition applications, potentially increasing accessibility for the deaf and hard-of-hearing community in India through smartphone integration.

Key Points
  • Achieves 99.94% accuracy on 26 Indian Sign Language alphabet classes using 12,637 images.
  • Uses only 4.2M parameters—62% fewer than ResNet18—enabling efficient on-device deployment.
  • Outperforms 2015 SURF-based methods (92% accuracy) and enables classical classifiers like SVM to reach 99.63%.

Why It Matters

Enables accurate, real-time sign language translation on smartphones, breaking down communication barriers for millions.