Research & Papers

GICC framework lets GPUs drive network ops, slashing latency 229x

New runtime cuts GPU coordination latency by up to 229x on HPE Slingshot networks...

Deep Dive

Researchers Baodi Shan, Mauricio Araya-Polo, and Barbara Chapman have introduced GICC (GPU-Initiated Communication and Coordination), a high-performance runtime that allows GPU kernels to directly control NIC-level operations without host CPU intervention. This addresses a critical bottleneck in distributed GPU applications, where existing runtimes rely on host-driven progress and lack mechanisms for recycling pre-staged NIC work across repeated GPU-triggered operations. GICC decouples coordination semantics from data movement and introduces asynchronous resource reclamation, where the NIC signals completion to both GPU and host memory simultaneously, enabling a lightweight host thread to recycle NIC resources without injecting latency.

On HPE Slingshot interconnects—which power 6 of the top 10 systems in the November 2025 Top500, including the top 3—GICC reduces per-coordination latency by up to 229x and improves weak scaling efficiency by up to 25%. On InfiniBand, it achieves up to 1.95x lower put latency than NVSHMEM by eliminating unnecessary locking and synchronization. In an industrial stencil proxy running on 64 AMD MI250X GCDs, GPU-aware MPI incurred over 52% higher communication time than GICC, which achieved 42% parallel efficiency versus MPI's 35.4%. The framework is implemented on both NVIDIA and AMD GPUs.

Key Points
  • GICC enables GPU kernels to directly trigger NIC operations without host CPU involvement on the fast path
  • On HPE Slingshot, reduces per-coordination latency by up to 229x and improves weak scaling efficiency by 25%
  • On InfiniBand, achieves 1.95x lower put latency than NVSHMEM by eliminating unnecessary locking and synchronization

Why It Matters

Unlocks massive performance gains for HPC workloads by eliminating CPU bottlenecks in GPU-to-GPU communication across top supercomputers.

📬 Get the top 10 AI stories daily