Research & Papers

Accelerating Triangle Counting with Real Processing-in-Memory Systems

Researchers achieve breakthrough using commercial PIM hardware to tackle memory bottlenecks in large-scale network analysis.

Deep Dive

A research team from ETH Zurich and the University of Padua has published a breakthrough paper demonstrating the first practical application of commercial Processing-in-Memory (PIM) hardware to accelerate a fundamental graph algorithm. Their work focuses on Triangle Counting (TC), a critical but memory-intensive operation used in social network analysis, cybersecurity, and bioinformatics. Traditional CPU systems struggle with TC due to excessive memory accesses across large datasets with low data reuse, creating significant performance bottlenecks.

The researchers specifically targeted the UPMEM system, the first commercially available PIM architecture, which places processing cores directly within DRAM memory banks. To overcome UPMEM's limitations—particularly limited local memory and expensive inter-core communication—the team developed a novel algorithm combining vertex coloring to minimize communication and reservoir sampling to manage memory constraints. They further enhanced performance using the Misra-Gries summary for graphs with high-degree nodes and uniform edge sampling for faster approximate results.

In benchmark tests, their PIM-based implementation outperformed state-of-the-art CPU solutions when processing dynamic graphs in Coordinate List (COO) format. This result is significant because it moves PIM from theoretical promise to demonstrated performance gain for a real-world, memory-bound workload. The success validates the UPMEM architecture's design and provides a blueprint for adapting other graph algorithms to PIM systems, potentially unlocking faster analysis of massive networks like social media graphs or internet topology maps.

Key Points
  • First algorithm designed for UPMEM's commercial PIM hardware, beating CPU performance on dynamic graph processing.
  • Uses vertex coloring and reservoir sampling to overcome PIM core memory limits and communication costs.
  • Enables faster triangle counting for network analysis, security, and bioinformatics applications.

Why It Matters

Proves commercial PIM hardware can accelerate real-world data analysis, paving the way for faster network security and social graph processing.