Detecting Hate and Inflammatory Content in Bengali Memes: A New Multimodal Dataset and Co-Attention Framework
New 3,247-meme dataset and MCFM model tackle culturally specific hate speech in low-resource language.
A research team from Sylhet Engineering College and Daffodil International University has published a landmark paper addressing a critical gap in AI content moderation: detecting hateful and inflammatory content in Bengali memes. The work introduces Bn-HIB (Bangla Hate Inflammatory Benign), a novel, manually annotated dataset of 3,247 Bengali memes categorized as Benign, Hate, or Inflammatory. This is the first dataset to specifically separate inflammatory content from direct hate speech in the Bengali language, tackling a problem magnified by the satirical, subtle, and culturally specific nature of memes, which has left low-resource languages largely unserved by existing research focused on English and other high-resource languages.
To analyze this multimodal data, the team proposed the MCFM (Multi-Modal Co-Attention Fusion Model), a purpose-built architecture that uses a co-attention mechanism to mutually analyze and identify the most critical features from both the image and text of a meme before fusing them for classification. Their experiments demonstrate that MCFM significantly outperforms several state-of-the-art baseline models on the Bn-HIB dataset. This research provides a crucial benchmark and toolset for developing more equitable and effective content moderation systems for the over 230 million Bengali speakers worldwide, paving the way for similar work in other underrepresented languages.
- Introduces Bn-HIB, the first dataset of 3,247 Bengali memes annotated for Hate, Inflammatory, and Benign content.
- Proposes the MCFM model using a co-attention mechanism to fuse visual and textual features, outperforming existing models.
- Addresses a critical gap in AI moderation for low-resource languages, focusing on culturally specific, subtle harmful content.
Why It Matters
Enables equitable content moderation for 230M+ Bengali speakers and sets a blueprint for other low-resource languages.