Research & Papers

Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU

Researchers replace a costly symbolic step with lightweight HyperLogLog estimators for faster sparse matrix math.

Deep Dive

Researchers Yifan Li and Giulia Guidi have introduced 'Ocean', a novel GPU-accelerated framework for Sparse General Matrix-Matrix Multiplication (SpGEMM) that challenges a fundamental assumption in high-performance computing. SpGEMM is a critical but irregular kernel used in scientific simulations, graph analytics, and machine learning. Current GPU solutions use a two-pass workflow where a 'symbolic pass' pre-calculates memory needs, consuming roughly 28% of the total runtime. Ocean's breakthrough is questioning the need for this exact, costly step.

Instead, Ocean employs a fast, estimation-based workflow. It replaces the symbolic pass with lightweight HyperLogLog probabilistic estimators to predict output structure. Combined with a runtime analysis strategy that dynamically selects the optimal computational path and configures memory accumulators, this avoids the upfront cost. The team also designed a novel hybrid accumulator that leverages both shared and global GPU memory via a hash-based approach for efficient data handling.

The result is consistently superior performance across a wide range of sparse matrices, including both square and rectangular forms. Benchmarks on modern NVIDIA A100 and H100 GPU architectures show speedups of 1.4x to 2.8x over state-of-the-art libraries like nsparse and cuSPARSE. This work, accepted to the 2026 International Conference on Supercomputing (ICS), demonstrates that trading exact pre-computation for intelligent estimation can unlock significant performance gains for a foundational computational operation.

Key Points
  • Replaces slow symbolic pass (28% of runtime) with fast HyperLogLog estimators for output prediction.
  • Introduces a hybrid hash-based accumulator using both shared and global GPU memory for efficiency.
  • Achieves 1.4x to 2.8x speedups over leading methods on NVIDIA A100/H100 GPUs for sparse matrix math.

Why It Matters

Accelerates core computations for large-scale AI training, scientific simulations, and graph analytics, reducing cost and time.