AI Safety

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

New AI system classifies programming skills into CEFR levels, revealing a major 'B2 bottleneck' for learners.

Deep Dive

A team of researchers led by Ricardo Hidalgo-Aragón and Jesús M. González-Barahona has published a novel AI framework designed to automate the large-scale assessment of programming skills. The system applies a machine learning technique called Fuzzy C-Means clustering to analyze over 2 million Scratch projects from the Dr. Scratch platform. Its core innovation is mapping the resulting clusters to the established Common European Framework of Reference (CEFR) for languages, creating universal A1 to C2 proficiency levels for computational thinking. This provides a standardized, transparent metric for schools and edtech platforms.

Beyond simple classification, the framework introduces enhanced metrics that identify learners in transitional states between levels, enabling continuous progress tracking. It also quantifies classification certainty, allowing the system to balance automated feedback with triggers for human instructor review. The analysis of the massive dataset uncovered significant insights into systemic learning gaps, most notably a 'B2 bottleneck.' Only 13.3% of learners reside at this intermediate level, with progression hindered by the cognitive load of integrating advanced concepts like Logic Synchronization and Data Representation.

The paper, accepted at the CSEDU 2026 conference, positions this framework as a solution for the growing need of schools and training platforms to assess programming proficiency reproducibly and at scale. By providing actionable, data-driven insights into curriculum effectiveness and individual learner pathways, it moves beyond simple grading to support personalized education. The methodology bridges the fields of AI, machine learning, and software engineering education, offering a model that could be adapted for assessing skills in other block-based or even text-based programming environments.

Key Points
  • Applies Fuzzy C-Means clustering to automatically classify 2,008,246 Scratch projects into CEFR skill levels (A1-C2).
  • Reveals a 'B2 bottleneck' where only 13.3% of learners progress, pinpointing Logic Synchronization and Data Representation as key hurdles.
  • Provides certainty-based triggers for human intervention, balancing scalable automated assessment with necessary instructor review.

Why It Matters

Enables scalable, personalized programming education by automatically diagnosing skill levels and systemic curriculum gaps with data.