GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming
New heterogeneous PIM architecture tackles both DNA sequencing and network analysis with specialized compute tiles.
A research team led by Yanru Chen has unveiled GEN-Graph, a groundbreaking heterogeneous processing-in-memory (PIM) accelerator designed to solve a fundamental bottleneck in computational science. Graph-based dynamic programming (DP) is critical for applications ranging from DNA sequence alignment to social network analysis, but these tasks have conflicting computational patterns—some are compute-bound and regular, while others are memory-bound and irregular. GEN-Graph's novel architecture integrates two specialized compute tiles within a single 2.5D package: a Matrix-tile using processing-using-memory (PUM) for tasks like all-pairs shortest path, and a Traversal-tile using processing-near-memory (PNM) for genomic sequence alignment.
This hardware-software co-design employs recursive partitioning and reconfigurable windowed bit-parallel logic to ensure exact computation, not approximations. The performance results are staggering: for the all-pairs shortest path problem, the Matrix-tile achieves a 42.8x speedup and 392x better energy efficiency compared to NVIDIA's flagship H100 GPU. For DNA sequence-to-graph alignment, the Traversal-tile sustains throughput of 2.56 million short-reads per second, outperforming state-of-the-art accelerators by up to 2.56x.
GEN-Graph represents a significant leap toward specialized hardware that can efficiently handle the diverse computational patterns of modern data science. By matching hardware specialization to algorithmic structure, it provides the first scalable solution for general DP dataflows, potentially accelerating breakthroughs in personalized medicine, logistics optimization, and complex network analysis where exact results are non-negotiable.
- Specialized Matrix-tile achieves 42.8x speedup over NVIDIA H100 GPU for network pathfinding
- Traversal-tile handles 2.56 million DNA short-reads per second, 2.56x faster than current accelerators
- Hardware-software co-design ensures exact computation for critical applications in genomics and analytics
Why It Matters
Enables faster, more energy-efficient genomic analysis and network optimization where exact results are critical for scientific and business decisions.