Research & Papers

Generating Synthetic Citation Networks with Communities

Researchers unveil a generator that uses 4 orders fewer parameters...

Deep Dive

A team of researchers from Poland has published a comprehensive study on generating synthetic citation networks, introducing a new algorithm called Citation Seeder (CS) that dramatically improves efficiency. The paper, available on arXiv, compares 12 methods for generating directed, nearly acyclic graphs with community structures across 7 real citation networks and 26 distinct metrics. The authors propose reversing edge directions in static generators to break cycles and mimic citation flow, which significantly boosts the performance of a degree-corrected Stochastic Block Model.

The key innovation is the CS algorithm, an iterative generator grounded in the Price-Pareto model. It achieves competitive results against top-performing baselines while using up to four orders of magnitude fewer parameters—meaning it's 10,000x more parameter-efficient. CS runs in linear O(N+E) time, making it scalable for large networks. The study also introduces a novel evaluation approach that distinguishes between endogenous and exogenous mesoscopic similarities, revealing that high-parameter models often overfit by memorizing planted community statistics rather than producing realistic networks.

Key Points
  • Citation Seeder (CS) uses up to 10,000x fewer parameters than competing models while matching their performance
  • CS runs in O(N+E) linear time, enabling generation of large-scale citation networks efficiently
  • Reversing edge directions in static generators significantly improves the degree-corrected Stochastic Block Model
  • The study evaluated 12 methods across 7 real networks and 26 metrics for community detection benchmarking

Why It Matters

Enables more realistic benchmarking of community detection algorithms with dramatically less computational overhead.