Multimodal Analysis of State-Funded News Coverage of the Israel-Hamas War on YouTube Shorts
Researchers built a multimodal AI pipeline to analyze sentiment and visuals in over 2,300 state-funded news Shorts.
Researchers Daniel Miehling and Sandra Kuebler have published a novel study applying multimodal AI to analyze geopolitical news on YouTube Shorts. Their custom pipeline processes short-form videos by first generating automatic transcripts, then performing aspect-based sentiment analysis (ABSA) on the text, and finally classifying semantic scenes from the visual frames. They applied this method to a dataset of over 2,300 conflict-related Shorts and more than 94,000 frames from major international, state-funded broadcasters covering the Israel-Hamas war.
The analysis revealed that the sentiment expressed toward specific aspects of the conflict (like 'military action' or 'civilian impact') differed significantly across news outlets and evolved over time. In contrast, the AI's classification of visual scenes—such as 'urban destruction' or 'political speeches'—remained consistent and aligned with real-world events. A key technical finding was that smaller, specialized models fine-tuned for the domain of news analysis outperformed larger, general-purpose transformers and even large language models (LLMs) on the sentiment task, highlighting an efficient path for computational social science.
The study's framework is designed as a template for analyzing other short-form platforms like TikTok and Instagram Reels. It demonstrates how combining quantitative multimodal AI methods with qualitative interpretation can systematically uncover patterns in sentiment and visual narrative within algorithmically driven media environments, offering a powerful tool for media researchers and analysts.
- Analyzed over 2,300 YouTube Shorts and 94,000 visual frames from state-funded news outlets covering the Israel-Hamas war.
- Found that smaller, domain-adapted AI models for sentiment analysis outperformed larger transformers and LLMs like GPT-4.
- The multimodal pipeline combines transcription, aspect-based sentiment analysis (ABSA), and scene classification, serving as a template for TikTok/Instagram research.
Why It Matters
Provides a scalable, AI-driven method to audit narrative biases and visual framing in the short-form video content that dominates modern news consumption.