Multiclass Hate Speech Detection with RoBERTa-OTA: Integrating Transformer Attention and Graph Convolutional Networks
New AI architecture adds structured knowledge to language models, improving detection of gender-based hate by 2.36%.
Researchers Mahmoud Abusaqer and Jamil Saquer have introduced RoBERTa-OTA, a novel AI architecture designed to tackle the complex challenge of multiclass hate speech detection. The model addresses a key limitation of existing methods, which rely solely on patterns learned from training data, by explicitly integrating structured ontological knowledge. This allows the system to better understand implicit targeting strategies and linguistic variability in social media content, leading to more accurate classification across specific demographic categories like race, religion, and gender.
The architecture combines the powerful text embeddings of the RoBERTa language model with Graph Convolutional Networks (GCNs) to process both textual features and formal domain knowledge. Its 'ontology-guided attention' mechanism is the key innovation, enabling the model to focus on semantically relevant information. Evaluated on 39,747 balanced samples, RoBERTa-OTA achieved a state-of-the-art 96.04% accuracy, a full percentage point above standard RoBERTa. Most notably, it showed substantial gains on challenging categories, improving gender-based hate speech detection by 2.36 percentage points. Crucially, this performance boost comes with only a 0.33% increase in parameters, making it a computationally efficient solution for real-world, large-scale content moderation platforms that require fine-grained analysis.
- Achieves 96.04% accuracy, a 1.02% improvement over standard RoBERTa on a dataset of 39,747 samples.
- Shows major gains on tough categories: gender-based hate speech detection improved by 2.36 percentage points.
- Adds structured knowledge via Graph Convolutional Networks with minimal computational cost (0.33% parameter overhead).
Why It Matters
Enables social platforms to deploy more accurate, nuanced, and scalable AI moderation for harmful content targeting specific groups.