GICC: A High-Performance Runtime for GPU-Initiated Communication and Coordination in Modern HPC Systems
New runtime cuts GPU coordination latency by up to 229x on HPE Slingshot networks...
Researchers Baodi Shan, Mauricio Araya-Polo, and Barbara Chapman have introduced GICC (GPU-Initiated Communication and Coordination), a high-performance runtime that allows GPU kernels to directly control NIC-level operations without host CPU intervention. This addresses a critical bottleneck in distributed GPU applications, where existing runtimes rely on host-driven progress and lack mechanisms for recycling pre-staged NIC work across repeated GPU-triggered operations. GICC decouples coordination semantics from data movement and introduces asynchronous resource reclamation, where the NIC signals completion to both GPU and host memory simultaneously, enabling a lightweight host thread to recycle NIC resources without injecting latency.
On HPE Slingshot interconnects—which power 6 of the top 10 systems in the November 2025 Top500, including the top 3—GICC reduces per-coordination latency by up to 229x and improves weak scaling efficiency by up to 25%. On InfiniBand, it achieves up to 1.95x lower put latency than NVSHMEM by eliminating unnecessary locking and synchronization. In an industrial stencil proxy running on 64 AMD MI250X GCDs, GPU-aware MPI incurred over 52% higher communication time than GICC, which achieved 42% parallel efficiency versus MPI's 35.4%. The framework is implemented on both NVIDIA and AMD GPUs.
- GICC enables GPU kernels to directly trigger NIC operations without host CPU involvement on the fast path
- On HPE Slingshot, reduces per-coordination latency by up to 229x and improves weak scaling efficiency by 25%
- On InfiniBand, achieves 1.95x lower put latency than NVSHMEM by eliminating unnecessary locking and synchronization
Why It Matters
Unlocks massive performance gains for HPC workloads by eliminating CPU bottlenecks in GPU-to-GPU communication across top supercomputers.