Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations
A new method combining web-scale data and ensemble LLM annotations improves detection by up to 11% for small models.
A team of researchers has published a paper demonstrating a significant advance in automated hate speech detection across multiple languages. Their approach combines two powerful techniques: continued pre-training on massive, unlabeled web data from Common Crawl's OpenWebText corpus, and the generation of synthetic training labels using an ensemble of four leading open-source large language models. The study focused on four languages—English, German, Spanish, and Vietnamese—and tested the method across sixteen different benchmark datasets.
The results show clear, measurable improvements. The continued pre-training step alone provided an average macro-F1 gain of approximately 3% over standard baselines. The researchers then explored three strategies for combining the annotations from the LLM ensemble: mean averaging, majority voting, and a LightGBM meta-learner. The LightGBM ensemble proved most effective. Fine-tuning models on these synthetic labels yielded substantial benefits, especially for smaller, more efficient models. The 1-billion-parameter Llama3.2-1B saw an impressive 11% increase in pooled F1 score, while the larger 14-billion-parameter Qwen2.5-14B showed a more modest 0.6% gain.
This research highlights a practical and scalable pathway for improving content moderation systems, particularly for languages where labeled training data is scarce. The findings indicate that the combination of web-scale data and LLM-powered synthetic annotation is most valuable for smaller models and low-resource settings, offering a cost-effective way to enhance performance without requiring massive human-labeled datasets. The paper has been accepted for presentation at LREC 2026.
- Method combines web-scale unlabeled data (Common Crawl's OpenWebText) with synthetic labels from an ensemble of 4 LLMs (Mistral-7B, Llama3.1-8B, Gemma2-9B, Qwen2.5-14B).
- Achieved an average 3% macro-F1 gain across 16 benchmarks, with an 11% F1 boost for the small Llama3.2-1B model.
- The LightGBM meta-learner ensemble strategy outperformed simpler averaging methods, proving most effective for combining LLM annotations.
Why It Matters
Enables more effective, scalable content moderation for low-resource languages, reducing reliance on expensive human-labeled data.