Research & Papers

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

arXiv cs.SI April 22, 2026

⚡First comprehensive benchmark of 7 LLMs across authorship, generation, and inference...

Deep Dive

A new arXiv paper presents the first comprehensive evaluation of modern LLMs—including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT—across three core social media analytics tasks using a Twitter (X) dataset. For authorship verification, the team introduced a systematic sampling framework over diverse user and post selection strategies, evaluating generalization on newly collected tweets from January 2024 onward to mitigate seen-data bias. For post generation, they assessed the ability of LLMs to produce authentic, user-like content using comprehensive evaluation metrics, and bridged these tasks with a user study measuring real users' perceptions of LLM-generated posts conditioned on their own writing.

For user attribute inference, the researchers annotated occupations and interests using two standardized taxonomies (IAB Tech Lab 2023 and 2018 U.S. SOC), benchmarking LLMs against existing baselines. The study provides new insights into how well these models handle social media analytics tasks, establishing reproducible benchmarks for the field. The code and data are provided in the supplementary material and will be made publicly available upon publication.

Key Points

Evaluated 7 LLMs including GPT-4o, Gemini 1.5 Pro, DeepSeek-V3, and Llama 3.2 on 3 social media tasks
Introduced a systematic sampling framework to mitigate seen-data bias using tweets from Jan 2024 onward
Used IAB Tech Lab 2023 and 2018 U.S. SOC taxonomies for standardized occupation/interest annotation

Why It Matters

Establishes reproducible benchmarks for LLM-driven social media analytics, crucial for content moderation and user understanding.

Read Original Article

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

Why It Matters

Stay Ahead in AI