Research & Papers

SOLANET delivers 11x speedup on GPU-accelerated neighbor graph construction for billion-scale data

New distributed neighbor graph toolkit scales to 2 billion points with near-linear speedup across 512 AMD APUs

Deep Dive

SOLANET tackles the challenge of building neighbor graphs—a fundamental component in AI and data analytics workloads—at unprecedented scale. The toolkit first partitions data across GPUs and constructs local approximate neighbor graphs independently. It then refines these graphs by pulling relevant remote graph structures from other GPUs using MPI one-sided operations, enabling efficient inter-node communication without locks. On a single AMD MI300A APU, SOLANET's lock-free algorithm surpasses existing GPU-based approximate neighbor graph implementations across multiple datasets, setting a new performance baseline for single-node GPU systems.

Distributed scaling results are equally impressive. SOLANET achieves an 11x speedup when scaling from 32 to 512 AMD APUs for 1 billion data points, and a 6.9x speedup from 64 to 512 APUs for 2 billion points—demonstrating near-linear scalability. This makes it practical to build high-quality neighbor graphs for datasets that were previously too large for single-node processing. The work, which appeared on arXiv, positions SOLANET as a foundational tool for accelerating graph-based machine learning, clustering, and similarity search in large-scale distributed environments.

Key Points
  • Uses MPI one-sided operations for lock-free remote graph pulling across GPUs, minimizing communication overhead
  • Demonstrates 11x speedup from 32 to 512 AMD MI300A APUs for 1 billion data points, and 6.9x for 2 billion points
  • Single-GPU algorithm outperforms state-of-the-art GPU-based approximate neighbor graph construction on multiple datasets

Why It Matters

Scales billion-point neighbor graph construction across hundreds of GPUs, accelerating AI data pipelines and large-scale analytics