Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior
A new pipeline uses AI to analyze student engagement without storing any identifiable video footage.
A research team led by Nolan Platt has published a novel method for analyzing student engagement in classrooms using large language models (LLMs) while prioritizing privacy. Their system processes classroom video through a pipeline that first uses OpenPose for skeletal keypoint extraction and Gaze-LLE for estimating visual attention. Crucially, the original video frames are deleted immediately after this processing, retaining only anonymized geometric coordinates stored as JSON files. This design ensures compliance with educational privacy laws like FERPA. The extracted pose and gaze data is then fed into the QwQ-32B-Reasoning model, which performs a zero-shot analysis of student behavior across different segments of a lecture.
The insights generated by the QwQ-32B model are presented to instructors through a web dashboard, which features visualizations like attention heatmaps and textual summaries of classroom dynamics. The research represents a significant step toward using multimodal AI for automated educational analytics, a task that traditionally requires manual, time-consuming observation. However, the authors note a key limitation: the LLM still struggles with spatial reasoning about physical classroom layouts, such as understanding which students are in each other's lines of sight. This work outlines a promising, scalable direction for AI in education while highlighting specific areas—like improving spatial comprehension in models—that require further development to make such tools fully robust.
- Privacy-first design deletes raw video after processing, storing only anonymized JSON coordinates for FERPA compliance.
- Uses a two-stage pipeline: OpenPose/Gaze-LLE for feature extraction, then the QwQ-32B-Reasoning LLM for zero-shot behavioral analysis.
- Provides instructors with a dashboard of attention heatmaps and summaries, automating a traditionally manual observation task.
Why It Matters
It offers schools a scalable, ethical tool to measure engagement and improve teaching, without violating student privacy.