BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task
GPT-5.4 scores 0.98 F1 on synthetic battery passport conformance task
A team of researchers from Luleå University of Technology (Tosin Adewumi, Martin Karlsson, Lama Alkhaled, Marcus Liwicki) introduced BatteryPass-12K, the first public dataset for a novel task: classifying digital battery passport (DBP) conformance under the upcoming EU battery regulation. The dataset was synthetically generated from real pilot samples to fill the gap of no existing public data. The benchmark evaluates 22 language models (LMs) in zero-shot inference, including small LMs (SLMs), mixture-of-experts (MoEs), and dense LLMs. Key findings show that thinking models like GPT-5.4 lead with an F1 score of 0.98 (validation) and 0.71 (test), few-shot examples significantly improve performance, frontier models find the task challenging, scaling parameters doesn't guarantee better results (SLMs sometimes beat LLMs), and prompt-injection attacks degrade accuracy. The dataset is released under a permissive CC-BY-4.0 license, making it accessible for further research in battery lifecycle reasoning and related tasks.
- BatteryPass-12K is the first public dataset for digital battery passport conformance, built synthetically from real pilot samples to support upcoming EU regulations.
- GPT-5.4 achieved top performance (F1=0.98 on validation, 0.71 on test), but few-shot learning boosted accuracy across all models.
- Prompt-injection attacks degraded performance, and scaling model parameters didn't always improve results—SLMs sometimes outperformed larger LLMs.
Why It Matters
Enables automated compliance verification for EU battery passports, critical for supply chain transparency and regulatory readiness.