UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals
Researchers use multi-agent LLMs to predict global university rankings from anonymized bibliometric data alone.
Researchers Pedram Riyazimehr and Seyyed Ehsan Mahmoudi have developed UniRank, a novel multi-agent LLM pipeline that can estimate global university rankings using only anonymized bibliometric data. The system employs a three-stage architecture that processes publicly available data from OpenAlex and Semantic Scholar while completely redacting institutional names, countries, DOIs, paper titles, and collaboration countries to prevent LLM memorization from influencing results.
On the Times Higher Education (THE) World University Rankings (n=352), UniRank achieved a Spearman correlation coefficient of ρ = 0.769 and a Kendall τ of 0.591, demonstrating strong predictive capability. The system's Mean Absolute Error was 251.5 rank positions, with a Median Absolute Error of 131.5. Critically, the researchers measured a Memorization Index of exactly zero, meaning no exact-match predictions occurred among all 352 universities, providing strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings.
The performance varied significantly across university tiers, with elite institutions showing MAE = 60.5 and hit@100 = 90.5%, while tail-tier universities had MAE = 328.2 and hit@100 = 20.8%. This degradation pattern, along with a systematic positive-signed error of +190.1 positions (indicating the system consistently predicts worse ranks than actual), further supports the conclusion that UniRank is performing genuine analysis. The research demonstrates how multi-agent LLM systems can be structured to perform complex analytical tasks while preventing data contamination and memorization issues that often plague AI evaluation methods.
- UniRank achieves Spearman ρ = 0.769 correlation with THE rankings using only anonymized bibliometric data
- System has Memorization Index of exactly zero, proving it performs genuine reasoning rather than recalling memorized data
- Performance degrades from elite tier (MAE = 60.5) to tail tier (MAE = 328.2), showing analytical pattern
Why It Matters
Demonstrates how multi-agent AI systems can perform complex analytical reasoning without data memorization, advancing evaluation methodologies.