Generating Synthetic Citation Networks with Communities
Researchers unveil a generator that uses 4 orders fewer parameters...
A team of researchers from Poland has published a comprehensive study on generating synthetic citation networks, introducing a new algorithm called Citation Seeder (CS) that dramatically improves efficiency. The paper, available on arXiv, compares 12 methods for generating directed, nearly acyclic graphs with community structures across 7 real citation networks and 26 distinct metrics. The authors propose reversing edge directions in static generators to break cycles and mimic citation flow, which significantly boosts the performance of a degree-corrected Stochastic Block Model.
The key innovation is the CS algorithm, an iterative generator grounded in the Price-Pareto model. It achieves competitive results against top-performing baselines while using up to four orders of magnitude fewer parameters—meaning it's 10,000x more parameter-efficient. CS runs in linear O(N+E) time, making it scalable for large networks. The study also introduces a novel evaluation approach that distinguishes between endogenous and exogenous mesoscopic similarities, revealing that high-parameter models often overfit by memorizing planted community statistics rather than producing realistic networks.
- Citation Seeder (CS) uses up to 10,000x fewer parameters than competing models while matching their performance
- CS runs in O(N+E) linear time, enabling generation of large-scale citation networks efficiently
- Reversing edge directions in static generators significantly improves the degree-corrected Stochastic Block Model
- The study evaluated 12 methods across 7 real networks and 26 metrics for community detection benchmarking
Why It Matters
Enables more realistic benchmarking of community detection algorithms with dramatically less computational overhead.