Research & Papers

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

Breakthrough method uses just O(1/c) passes to sample subgraphs in massive graphs.

Deep Dive

A team including Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, and Mauro Sozio has introduced a streaming algorithm for approximating the frequencies of induced k-vertex subgraphs (k-graphlets) in massive graphs. The key advance is breaking the O(log n)-pass bound of the prior state-of-the-art by Bourreau et al. (NeurIPS 2024), achieving just O(1/c) passes for any fixed c>0, using O~(n^(1+c)) memory. This is proven optimal up to a factor of O~(n^c) in memory usage.

Experiments on real-world and synthetic graphs show the algorithm consistently matches or significantly outperforms the prior method, especially on mildly dense graphs where speedups are orders of magnitude. The work addresses a critical bottleneck in graphlet analysis—loading entire graphs into memory—making it practical for social networks, biological networks, and other large-scale systems. Preprint available on arXiv (2604.25400).

Key Points
  • Algorithm uses O(1/c) passes vs. prior O(log n) passes, reducing I/O costs dramatically.
  • Memory usage is O~(n^(1+c)), proven near-optimal against known lower bounds.
  • Outperforms Bourreau et al.'s NeurIPS 2024 method by orders of magnitude on mildly dense graphs.

Why It Matters

Enables graphlet analysis on graphs too large for memory, unlocking insights in social networks, bioinformatics, and more.