Research & Papers

Matrix Factorization Framework for Community Detection under the Degree-Corrected Block Model

A novel method processes 100K-node graphs in 4 minutes with high accuracy.

Deep Dive

Community detection, a core task in network analysis, typically relies on block models like the degree-corrected block model (DCBM) to account for node degree heterogeneity. However, existing inference methods are computationally expensive and highly sensitive to initialization. Spectral or modularity-based alternatives are cheaper but limited to detecting specific structures, like assortative communities. In a new preprint on arXiv, researchers Alexandra Dache, Arnaud Vandaele, and Nicolas Gillis show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization (NMF) problem. This insight allows them to propose a novel, structure-agnostic method that applies to any graph representable by a DCBM.

Their approach delivers impressive scalability: it processes a graph with 100,000 nodes and 1 million edges in approximately 4 minutes on standard hardware. The proposed initialization strategy also significantly improves solution quality and reduces iteration counts for all tested inference algorithms. Benchmarks on synthetic and real-world networks confirm that the method matches DCBM inference accuracy while being faster and more robust. This work bridges matrix factorization and probabilistic modeling, offering a practical tool for large-scale network analysis in fields like social network analysis, biology, and cybersecurity.

Key Points
  • Reformulates DCBM inference as a constrained nonnegative matrix factorization problem, enabling broader applicability.
  • Processes graphs with 100,000 nodes and 1,000,000 edges in ~4 minutes, outperforming traditional methods.
  • Provides a theoretically grounded initialization strategy that improves accuracy and reduces iterations for inference algorithms.

Why It Matters

This scalable framework makes fast, accurate community detection practical for large real-world networks.