PyTorch refactors GPU code cache to support Intel XPU hardware
This could dramatically accelerate AI training on Intel chips...
PyTorch developers have refactored the CUDACodeCache, extracting CUDA-independent functionality into a new CUTLASSCodeCache. This allows the same code cache system to be reused by Intel's XPU hardware, not just NVIDIA GPUs. The change is part of a larger effort to improve PyTorch's inductor compiler and make it more hardware-agnostic. This could lead to significant performance improvements for AI models running on Intel's upcoming competitive accelerators.
Why It Matters
It signals a major push for hardware diversity, potentially lowering AI compute costs and breaking NVIDIA's dominance.