Machine learning and digital pragmatics: Which word category influences emoji use most?
Researchers trained MARBERT on 8,695 tweets to predict emoji categories.
A new study from researchers Mohammed Q. Shormani, Ibrahim Abdulmalik Hassan, and Muneef Y. Alshawsh applies the MARBERT model to predict emoji usage in Arabic tweets. The team collected 11,379 tweets from Twitter via Python, filtering down to 8,695 for analysis. These tweets were classified into 14 emoji categories, which were numerically encoded as labels. A preprocessing pipeline was designed to examine relationships between lexical features and emoji categories, with MARBERT fine-tuned to predict emoji use from text.
The model achieved an overall accuracy of 75%, evaluated using precision, recall, and F1-scores. While promising, the study concludes that MARBERT and similar models need improvement for low-resource, multidialectal languages like Arabic. This research advances digital pragmatics by linking word categories to emoji choices, with implications for cross-cultural AI communication tools.
- MARBERT model achieved 75% accuracy predicting emoji categories from 8,695 Arabic tweets.
- Tweets were classified into 14 emoji categories, numerically encoded as labels.
- Study highlights need for better ML models for low-resource, multidialectal languages like Arabic.
Why It Matters
Improves cross-cultural AI communication by linking word categories to emoji predictions in Arabic.