[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]
A tiny 1.5B model achieves 84.9% accuracy on A1–C2 English proficiency classification using QLoRA.
A developer has fine-tuned Qwen2.5-1.5B using QLoRA (4-bit NF4 quantization) to classify English texts into the six CEFR proficiency levels (A1 through C2). The goal is to support adaptive language learning systems, placement testing, readability estimation, and other educational NLP applications. The dataset consists of 1,785 synthetically generated English texts balanced across all six levels and 10 topic domains. Generation was performed using the Groq API with Llama-3.3-70B, with constraints designed to preserve vocabulary complexity, grammatical progression, sentence structure variation, and CEFR-specific linguistic patterns. Only about 0.28% of the model's parameters were trained via LoRA adapters.
The results on a held-out test set of 179 samples show an accuracy of 84.9% and macro F1 of 84.9%. Per-level recall varies: A1 at 96.6%, A2 and B1 at 90%, B2 and C1 at 86.7%, and C2 at 60%. The majority of errors stem from confusion between C1 and C2, an expected challenge given the subtle linguistic boundary at that level. The developer has also built a FastAPI inference API and a Docker deployment setup, making the model easy to integrate. The model is available on Hugging Face as 'yanou16/cefr-english-classifier', and the developer welcomes feedback on evaluation methodology, synthetic data quality, and improving C2 classification performance.
- QLoRA 4-bit fine-tuning of Qwen2.5-1.5B trains only 0.28% of parameters, achieving 84.9% accuracy on CEFR classification.
- Dataset of 1,785 synthetic texts generated using Llama-3.3-70B via Groq API, balanced across 6 levels and 10 domains.
- Per-level recall ranges from 96.6% (A1) to 60% (C2); C1/C2 confusion is the main error source.
Why It Matters
Enables lightweight, deployable AI for adaptive language learning and placement testing without requiring large models.