Research & Papers

JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA

New JAX/CUDA library achieves order-of-magnitude performance gains for neighbor search and clustering on large datasets.

Deep Dive

A team of researchers including Jens Stücker, Oliver Hahn, and others has introduced JZ-Tree, a groundbreaking open-source library that rethinks spatial tree algorithms for modern GPU hardware. Traditional tree-based methods, while efficient on CPUs, often struggle on GPUs due to irregular memory access and thread divergence. JZ-Tree solves this with a novel Morton (z-order) 'plane-based tree hierarchy' that creates a flattened data layout specifically designed for GPU architectures. This enables efficient dual-tree traversal with collaborative execution across thread groups, resulting in highly coalesced memory access patterns.

The team implemented two critical spatial algorithms to demonstrate JZ-Tree's power: exact k-nearest neighbor search and friends-of-friends (FoF) clustering. For large problem sizes with over 10 million data points, JZ-Tree achieves more than an order-of-magnitude (10x+) performance improvement compared to the closest competing GPU libraries. The framework also shows strong scaling across distributed multi-GPU systems. Built using JAX and CUDA, JZ-Tree isn't just a one-off solution but serves as a foundational framework for porting a broad class of tree-based algorithms to GPUs efficiently, opening new possibilities for high-performance computing applications in fields from astrophysics to machine learning.

Key Points
  • Uses novel Morton 'plane-based tree hierarchy' to eliminate GPU bottlenecks like thread divergence and irregular memory access
  • Delivers 10x+ speedup for k-nearest neighbor search and friends-of-friends clustering on datasets larger than 10 million points
  • Provides open-source JAX/CUDA implementation that scales across multiple GPUs and serves as foundation for other tree algorithms

Why It Matters

Enables previously impractical large-scale spatial analysis in fields like cosmology, physics simulations, and ML, dramatically accelerating research and applications.