First algorithm to use GPU Tensor Cores for the Maximal Independent Set problem, traditionally limited to CUDA Cores?

First algorithm to use GPU Tensor Cores for the Maximal Independent Set problem, traditionally limited to CUDA Cores

Achieves up to 44.38x speedup on NVIDIA H200 GPU over SOTA methods, with average of 18.80x on H200?

Achieves up to 44.38x speedup on NVIDIA H200 GPU over SOTA methods, with average of 18.80x on H200

Maintains solution parity with near-optimal heuristics while reducing runtime from milliseconds to sub-millisecond on million-vertex graphs?

Maintains solution parity with near-optimal heuristics while reducing runtime from milliseconds to sub-millisecond on million-vertex graphs

Research & Papers

TC-MIS: Tensor Cores accelerate graph MIS by up to 44x

arXiv cs.DC May 29, 2026

⚡GPU Tensor Cores repurposed for graph algorithms deliver 44x speedups

Deep Dive

A team of researchers from IIT Jodhpur (Prajjwal Nijhara and Dip Sankar Banerjee) has introduced TC-MIS, a novel algorithm that repurposes GPU Tensor Cores (TCs) to solve the Maximal Independent Set (MIS) problem in graphs. Traditionally, MIS is a fundamental combinatorial problem used in resource allocation, scheduling, and network optimization, but it suffers from irregular memory access patterns and workload imbalance on GPUs. The key insight of TC-MIS is to reformulate the core phases of MIS computation as sparse matrix-vector multiplication (SpMV), which maps naturally onto Tensor Cores via Warp Matrix Multiply-Accumulate (WMMA) operations. This transforms irregular graph traversal into regular, massively parallel matrix operations.

The evaluation spans four NVIDIA microarchitectures: Ampere (RTX A5000), Ada Lovelace (L40S), Hopper (H200), and Blackwell (RTX 5080). TC-MIS achieves average speedups of 2.84x, 4.84x, 18.80x, and 5.20x respectively, with a maximum observed speedup of 44.38x on the H200 GPU over the state-of-the-art GPU-based MIS implementations. Solution quality remains comparable to established heuristics that produce near-maximum independent sets. This work opens a new direction for accelerating irregular graph algorithms using specialized hardware originally designed for matrix operations in ML and HPC.

Key Points

First algorithm to use GPU Tensor Cores for the Maximal Independent Set problem, traditionally limited to CUDA Cores
Achieves up to 44.38x speedup on NVIDIA H200 GPU over SOTA methods, with average of 18.80x on H200
Maintains solution parity with near-optimal heuristics while reducing runtime from milliseconds to sub-millisecond on million-vertex graphs

Why It Matters

Unlocks GPU Tensor Cores for graph algorithms, enabling faster optimization in networking, scheduling, and distributed systems.

Read Original Article

TC-MIS: Tensor Cores accelerate graph MIS by up to 44x

Why It Matters

Related Articles

🚀 Stay Ahead in AI