Research & Papers

Can LLMs Predict Academic Collaboration? Topology Heuristics vs. LLM-Based Link Prediction on Real Co-authorship Networks

LLMs outperform traditional network heuristics on 10M-author dataset, finding 93% of new collaborations.

Deep Dive

A new study by researchers Fan Huang and Munjung Kim demonstrates that large language models can effectively predict future academic collaborations using only researcher metadata, without access to network structure. Testing Qwen2.5-72B-Instruct on massive OpenAlex co-authorship networks (9.96M authors, 108.7M edges) across three historical AI research eras, the LLM achieved AUROC scores of 0.714-0.789 for new-edge prediction under natural class imbalance, outperforming traditional topology heuristics like Common Neighbors, Jaccard similarity, and Preferential Attachment. The model showed recall up to 92.9%, meaning it could identify nearly all future collaborations from candidate pairs.

Critically, the research revealed that 78.6-82.7% of new collaborations occur between authors with no common neighbors—a scenario where all topology-based methods score zero. In these cases, the LLM still achieved AUROC 0.652 by reasoning from author metadata alone. The study found that research concepts were the dominant predictive signal, with their removal dropping AUROC by 0.047-0.084. Interestingly, providing pre-computed graph features to the LLM actually degraded performance due to anchoring effects, suggesting LLMs and topology methods should operate as separate, complementary channels rather than combined approaches.

The temporal metadata ablation showed that research concepts provided the strongest signal for prediction, while socio-cultural factors like name-inferred ethnicity and institutional country didn't add predictive power beyond topology—reflecting the demographic homogeneity of AI research. A node2vec baseline performed comparably to Adamic-Adar, confirming that LLMs access fundamentally different information (author metadata) rather than encoding the same structural signals differently. This suggests LLMs could enhance recommendation systems for research collaboration, grant partnerships, and interdisciplinary team formation.

Key Points
  • Qwen2.5-72B achieved AUROC 0.714-0.789 on new collaboration prediction, outperforming all topology heuristics
  • LLMs successfully predicted 78.6-82.7% of collaborations between authors with zero shared connections
  • Research concepts were the dominant predictive signal, with removal dropping AUROC by 0.047-0.084

Why It Matters

LLMs could revolutionize academic matchmaking and research collaboration platforms by identifying promising partnerships traditional methods miss.