QASM-Eval dataset trains LLMs on OpenQASM-3 hardware programming
First benchmark for LLMs on quantum hardware-level code yields big gains.
Quantum computing is still in the Noisy Intermediate-Scale Quantum (NISQ) era, where hardware noise severely limits performance. To overcome this, programmers must use advanced features beyond simple gate sequences – like mid-circuit measurement, classical feedback for quantum error correction (QEC), precise timing for dynamical decoupling, and pulse-level waveform access. OpenQASM-3 was designed to expose these capabilities, yet no dataset existed to train large language models on this hardware-level programming interface. Now, researchers Zhenxiao Fu, Lei Jiang, and Fan Chen have released QASM-Eval, the first comprehensive dataset specifically targeting OpenQASM-3 code generation.
QASM-Eval contains 100 expert-verified test tasks and 4,000 training tasks, systematically covering four critical areas: classical logic, timing scheduling, pulse control, and complex real-world workflows. An extended verifier automatically checks syntax, quantum states, and program timelines. Initial evaluations show that current state-of-the-art LLMs perform poorly on these tasks, but targeted fine-tuning on QASM-Eval yields dramatic improvements. This dataset provides a crucial benchmark and training foundation, accelerating the development of reliable LLM assistants for hardware-facing quantum programming during the NISQ era.
- QASM-Eval includes 100 expert-verified test tasks and 4,000 training tasks covering classical logic, timing, pulse control, and workflows.
- Current LLMs struggle heavily on OpenQASM-3 coding; fine-tuning on QASM-Eval yields significant performance gains.
- Dataset targets hardware-facing features like mid-circuit measurement, dynamical decoupling timing, and pulse-level control.
Why It Matters
Bridges the gap for LLM-assisted quantum programming, enabling AI to handle real NISQ hardware constraints.