Double-Precision Matrix Multiplication Emulation via Ozaki-II Scheme with FP8 Quantization
New technique cuts required FP8 operations, enabling high-precision AI and HPC on next-gen GPUs like Blackwell Ultra.
A team of researchers from Japan, including Katsuhisa Ozaki (namesake of the method), has published a paper proposing a breakthrough for high-performance computing (HPC) and AI. They developed a novel technique to emulate double-precision (FP64) matrix multiplication—a core operation for scientific simulations and AI model training—using the much faster FP8 arithmetic units found in next-generation GPUs like NVIDIA's Blackwell Ultra and Rubin architectures. This is critical because while FP64 is essential for numerical accuracy, recent hardware advances have focused on boosting low-precision formats like FP8, leaving FP64 performance gains modest.
The key innovation is adapting the established Ozaki-II emulation scheme to work with FP8 hardware, a feat previously not possible with the original algorithm. Prior methods, like the Ozaki-I scheme, could use FP8 but were less efficient. The new approach significantly reduces the total number of FP8 matrix multiplication operations required to achieve an FP64-equivalent result. This means complex simulations and large language model (LLM) training that demand high precision can now run more efficiently on cutting-edge hardware designed for AI, bridging the gap between speed and accuracy for professional workloads.
- Enables FP64 precision using FP8 hardware units on GPUs like NVIDIA Blackwell Ultra, where INT8 performance is reduced.
- Novel adaptation of the Ozaki-II scheme cuts the number of required FP8 matrix multiplications vs. the older Ozaki-I method.
- Addresses a critical hardware trend: future performance gains in HPC and AI depend on leveraging high-throughput low-precision arithmetic like FP8.
Why It Matters
Allows scientists and AI engineers to run high-precision calculations efficiently on the latest AI-optimized hardware, accelerating research and model development.