Covers 16 microarchitectures across NVIDIA, AMD, Intel, and Apple GPUs?

Covers 16 microarchitectures across NVIDIA, AMD, Intel, and Apple GPUs

Write kernel once in WAVE ISA; compile to portable binary; deploy via Metal, PTX, HIP, or SYCL backends?

Write kernel once in WAVE ISA; compile to portable binary; deploy via Metal, PTX, HIP, or SYCL backends

PyTorch integration yields identical training results on Apple M4 Pro, NVIDIA T4, and AMD MI300X?

PyTorch integration yields identical training results on Apple M4 Pro, NVIDIA T4, and AMD MI300X

Research & Papers

WAVE: Write-once GPU kernels run on NVIDIA, AMD, Intel, and Apple hardware

r/MachineLearning May 26, 2026

⚡After 5,000 pages of GPU architecture docs, a developer unified all major ISAs into one portable toolchain.

Deep Dive

After reading over 5,000 pages of GPU architecture documentation—NVIDIA PTX, AMD ISA, Intel Xe, and reverse-engineered Apple GPU specs—a developer (not-your-typical-cs) noticed that all four vendors implement the same fundamental 11 operations under different names. This insight led to WAVE, a new portable instruction set architecture (ISA) that abstracts away vendor-specific low-level details. WAVE lets you write a GPU kernel once in a unified intermediate representation, compile it to a portable binary, and then deploy it across backends including Metal, PTX, HIP, and SYCL. The project has been verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X hardware, ensuring cross-platform correctness.

Co-author Onyinye built a PyTorch integration that enables identical training results across all supported backends, meaning developers can switch hardware without code changes or accuracy loss. The WAVE toolchain includes a compiler, runtime, and thin backend translators. The project is open source on GitHub under the name wave, with a preprint on arXiv and full documentation available. Developers can install it simply with pip install wave-gpu. WAVE promises to dramatically reduce the complexity of writing and maintaining GPU code for heterogeneous computing environments, eliminating the need to manually port kernels between vendors' ecosystems.

Key Points

Covers 16 microarchitectures across NVIDIA, AMD, Intel, and Apple GPUs
Write kernel once in WAVE ISA; compile to portable binary; deploy via Metal, PTX, HIP, or SYCL backends
PyTorch integration yields identical training results on Apple M4 Pro, NVIDIA T4, and AMD MI300X

Why It Matters

Eliminates GPU vendor lock-in, enabling a single kernel to run across all major hardware with no porting effort.

Read Original Article

WAVE: Write-once GPU kernels run on NVIDIA, AMD, Intel, and Apple hardware

Why It Matters

Related Articles

🚀 Stay Ahead in AI