Research & Papers

Vertex Merging & Splitting Errors Distort Coauthorship Networks, Counterfactual Analysis Finds

Initial-based disambiguation makes networks appear smaller and more connected than reality.

Deep Dive

A new counterfactual study by Jinseok Kim investigates how author name ambiguity in coauthorship network data distorts network metrics. The paper, published on arXiv and presented at ComplexNetworks2025, applies two widely-used initial-based disambiguation heuristics (using forename initials) to three large coauthorship datasets that had previously been accurately disambiguated using an algorithmic method. By randomly varying the number of merged or split vertices, the study simulated the errors induced by name ambiguity and computed nine standard network metrics across multiple scenarios.

Results reveal systematic biases: initial-based disambiguation produces networks that are smaller and more densely connected than the ground truth. Specifically, some metrics are underestimated, making the network appear more cohesive than it really is. In contrast, other metric values increase, making individual authors seem more collaborative and embedded in less fragmented research communities. The study emphasizes that such errors can lead to invalid conclusions about collaboration patterns, community structure, and research dynamics. Kim urges researchers to adopt careful disambiguation strategies—beyond simplistic initial-based methods—to ensure rigorous and valid findings in coauthorship network analysis.

Key Points
  • Initial-based disambiguation underestimates network size and overestimates connectivity.
  • Authors appear more collaborative and communities less fragmented than they actually are.
  • Study used three large coauthorship networks with accurate algorithmic disambiguation as ground truth.

Why It Matters

Researchers must use careful name disambiguation to avoid misleading conclusions about collaboration patterns.