Research & Papers

Co-Design and Evaluation of a CPU-Free MPI GPU Communication Abstraction and Implementation

New API cuts medium message latency in half and speeds up supercomputing benchmarks by 28%.

Deep Dive

A team from University of New Mexico, Oak Ridge, and Sandia National Labs designed a CPU-free MPI GPU communication API. It leverages HPE Slingshot 11 network cards and integrates with the Cabana/Kokkos framework. The system demonstrated a 50% latency reduction in GPU ping-pong tests and a 28% speedup when scaling a halo-exchange benchmark to 8,192 GPUs on the Frontier supercomputer, enabling more efficient large-scale ML and HPC workloads.

Why It Matters

This directly accelerates training for massive AI models and complex scientific simulations by removing a major bottleneck in GPU communication.