Research & Papers

WAVE: Write-once GPU kernels run on NVIDIA, AMD, Intel, and Apple hardware

After 5,000 pages of GPU architecture docs, a developer unified all major ISAs into one portable toolchain.

Deep Dive

After reading over 5,000 pages of GPU architecture documentation—NVIDIA PTX, AMD ISA, Intel Xe, and reverse-engineered Apple GPU specs—a developer (not-your-typical-cs) noticed that all four vendors implement the same fundamental 11 operations under different names. This insight led to WAVE, a new portable instruction set architecture (ISA) that abstracts away vendor-specific low-level details. WAVE lets you write a GPU kernel once in a unified intermediate representation, compile it to a portable binary, and then deploy it across backends including Metal, PTX, HIP, and SYCL. The project has been verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X hardware, ensuring cross-platform correctness.

Co-author Onyinye built a PyTorch integration that enables identical training results across all supported backends, meaning developers can switch hardware without code changes or accuracy loss. The WAVE toolchain includes a compiler, runtime, and thin backend translators. The project is open source on GitHub under the name wave, with a preprint on arXiv and full documentation available. Developers can install it simply with pip install wave-gpu. WAVE promises to dramatically reduce the complexity of writing and maintaining GPU code for heterogeneous computing environments, eliminating the need to manually port kernels between vendors' ecosystems.

Key Points
  • Covers 16 microarchitectures across NVIDIA, AMD, Intel, and Apple GPUs
  • Write kernel once in WAVE ISA; compile to portable binary; deploy via Metal, PTX, HIP, or SYCL backends
  • PyTorch integration yields identical training results on Apple M4 Pro, NVIDIA T4, and AMD MI300X

Why It Matters

Eliminates GPU vendor lock-in, enabling a single kernel to run across all major hardware with no porting effort.