Research & Papers

TC-MIS: Tensor Cores accelerate graph MIS by up to 44x

GPU Tensor Cores repurposed for graph algorithms deliver 44x speedups

Deep Dive

A team of researchers from IIT Jodhpur (Prajjwal Nijhara and Dip Sankar Banerjee) has introduced TC-MIS, a novel algorithm that repurposes GPU Tensor Cores (TCs) to solve the Maximal Independent Set (MIS) problem in graphs. Traditionally, MIS is a fundamental combinatorial problem used in resource allocation, scheduling, and network optimization, but it suffers from irregular memory access patterns and workload imbalance on GPUs. The key insight of TC-MIS is to reformulate the core phases of MIS computation as sparse matrix-vector multiplication (SpMV), which maps naturally onto Tensor Cores via Warp Matrix Multiply-Accumulate (WMMA) operations. This transforms irregular graph traversal into regular, massively parallel matrix operations.

The evaluation spans four NVIDIA microarchitectures: Ampere (RTX A5000), Ada Lovelace (L40S), Hopper (H200), and Blackwell (RTX 5080). TC-MIS achieves average speedups of 2.84x, 4.84x, 18.80x, and 5.20x respectively, with a maximum observed speedup of 44.38x on the H200 GPU over the state-of-the-art GPU-based MIS implementations. Solution quality remains comparable to established heuristics that produce near-maximum independent sets. This work opens a new direction for accelerating irregular graph algorithms using specialized hardware originally designed for matrix operations in ML and HPC.

Key Points
  • First algorithm to use GPU Tensor Cores for the Maximal Independent Set problem, traditionally limited to CUDA Cores
  • Achieves up to 44.38x speedup on NVIDIA H200 GPU over SOTA methods, with average of 18.80x on H200
  • Maintains solution parity with near-optimal heuristics while reducing runtime from milliseconds to sub-millisecond on million-vertex graphs

Why It Matters

Unlocks GPU Tensor Cores for graph algorithms, enabling faster optimization in networking, scheduling, and distributed systems.