MERBIT uses merge-path partitioning at global level and compact bit-field descriptors at local level for better workload balance?

MERBIT uses merge-path partitioning at global level and compact bit-field descriptors at local level for better workload balance.

Outperforms cuSPARSE COO by 1.27x (single) and 1.25x (double) precision on 50 irregular datasets?

Outperforms cuSPARSE COO by 1.27x (single) and 1.25x (double) precision on 50 irregular datasets.

Designed for iterative workloads like PageRank, sparse solvers, and large-scale graph analytics?

Designed for iterative workloads like PageRank, sparse solvers, and large-scale graph analytics.

Research & Papers

MERBIT: New GPU method speeds up sparse matrix workloads by 27%

arXiv cs.DC May 11, 2026

⚡Beats cuSPARSE with 1.27x speedups on 50 irregular datasets for PageRank and more.

Deep Dive

Sparse Matrix-Vector Multiplication (SpMV) is critical for iterative tasks like graph analytics and sparse solvers, but real-world graphs with irregular sparsity patterns make GPU acceleration hard. A new paper from Qi Zhang and colleagues introduces MERBIT, a method designed specifically for repeated SpMV on irregular, graph-like matrices (e.g., PageRank). MERBIT combines two proven ideas: global-level merge-path partitioning to balance work over nonzeros and row boundaries, and local-level encoding of each segment using a compact bit-field descriptor. This dual approach improves workload balance and enables coalesced memory access for both loading the matrix and writing outputs.

On 50 large irregular datasets, MERBIT achieved average speedups of 1.27x (single precision) and 1.25x (double precision) over NVIDIA's cuSPARSE COO format, beating both academic baselines like Ginkgo and other published methods. The paper incorporates three additional optimizations to further boost performance. While MERBIT is currently a research prototype, its consistent gains across diverse real-world graphs suggest it could become a standard building block for high-performance graph analytics and iterative solvers on GPUs.

Key Points

MERBIT uses merge-path partitioning at global level and compact bit-field descriptors at local level for better workload balance.
Outperforms cuSPARSE COO by 1.27x (single) and 1.25x (double) precision on 50 irregular datasets.
Designed for iterative workloads like PageRank, sparse solvers, and large-scale graph analytics.

Why It Matters

Faster SpMV means quicker PageRank iterations and scientific computing on GPUs, accelerating real-world graph analytics.

Read Original Article

MERBIT: New GPU method speeds up sparse matrix workloads by 27%

Why It Matters

Related Articles

🚀 Stay Ahead in AI