A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection
New hybrid AI architecture tackles overlapping abuse types in Bangla with transformer-LSTM fusion and class balancing.
A research team from Bangladesh has published a novel AI architecture for detecting multiple, overlapping forms of cyberbullying in the Bangla language. The paper, "A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection," addresses a critical gap in content moderation for low-resource languages. Most existing approaches use single-label classification, which fails to capture the reality where a single online comment can contain threats, hate speech, and harassment simultaneously. The researchers argue that multi-label detection is both more realistic and essential for effective intervention, especially in languages like Bangla where robust pre-trained models are scarce.
The proposed solution is a hybrid model that fuses the contextual understanding of BanglaBERT-Large, a transformer model, with the sequential dependency capture of a two-layer stacked LSTM. This fusion aims to overcome the limitations of each approach individually. The model was fine-tuned and rigorously evaluated on a public multi-label Bangla dataset covering four abuse categories. To handle significant class imbalance, the team applied different sampling strategies. Performance was assessed using a comprehensive suite of metrics—including accuracy, precision, recall, F1-score, Hamming loss, Cohen's kappa, and AUC-ROC—with 5-fold cross-validation to ensure the architecture's generalization capability. This work provides a valuable blueprint for building more nuanced and effective content safety systems in linguistically diverse digital spaces.
- Hybrid architecture fuses BanglaBERT-Large (for context) with a two-layer stacked LSTM (for sequence) to detect overlapping abuse types.
- Tackles multi-label classification for cyberbully, sexual harassment, threat, and spam in Bangla, a low-resource language.
- Employs sampling strategies to address class imbalance and uses 5-fold cross-validation with 7+ metrics for robust evaluation.
Why It Matters
Provides a scalable blueprint for nuanced content safety in low-resource languages, moving beyond simplistic single-label detection.