viable/strict/1774625659: [xpu][fix] Fix tensorwise scaling settings (#177810)
A recent oneDNN library update broke tensorwise scaling, causing 'could not create primitive' errors on Intel hardware.
A collaborative fix from Intel and PyTorch maintainers has resolved a significant technical bug that was breaking quantized operations on Intel XPU hardware. The issue, documented in PyTorch Pull Request #177810, stemmed from an upgrade to the oneDNN (oneAPI Deep Neural Network Library) to version 3.10. This update changed the API for setting tensorwise scaling factors, a crucial component for low-precision computations like 8-bit floating-point (Float8) math. Without the corresponding update in PyTorch's XPU backend code, attempts to use these operations would fail with a "could not create primitive" error, halting model execution.
The fix, which has been approved and merged into the main PyTorch repository, aligns the PyTorch-XPU integration with the latest oneDNN documentation. It specifically changes the primitive attribute settings for scaling to use an empty group specification (`groups={}`). This correction ensures that performance-critical quantization workflows, essential for efficient AI inference on Intel GPUs like the Arc series, function correctly. The developers validated the fix by reusing existing unit tests, such as `test_float8_scale`, confirming that scaled matrix multiplication operations now work as intended.
- Bug triggered by upgrade to oneDNN v3.10, breaking API for tensorwise scaling.
- Fix changes primitive settings to `groups={}` to match new oneDNN documentation.
- Restores functionality for Float8 quantized operations critical for efficient AI inference on Intel XPUs.
Why It Matters
This fix is essential for developers running PyTorch models with quantization on Intel GPUs, ensuring stable and efficient inference.