Beyond Single-Score Ranking: Facet-Aware Reranking for Controllable Diversity in Paper Recommendation
New AI model separates 'what problem' from 'how it's solved' for smarter academic search.
Researcher Duan Ming Tao has introduced SciFACE (Scientific Faceted Cross-Encoder), a novel reranking framework designed to solve a core flaw in current academic search. Existing systems output a single, monolithic similarity score, mixing different types of relatedness. SciFACE instead models two distinct facets independently: the 'Background' (what problem is studied) and the 'Method' (how it is solved). This allows users to understand and control *why* a paper is recommended, moving beyond a simple ranked list.
To train the model, the author created a dataset of 5,891 real seed-candidate paper pairs, with facet-specific similarity labels generated by GPT-4o-mini and validated against human judgments. The results are significant: on the CSFCube benchmark, SciFACE scored 70.63 NDCG@20 on the Background facet (5.9 points above SPECTER) and a remarkable 49.06 NDCG@20 on the Method facet, a 31.1-point leap over SPECTER. Crucially, it also outperformed the FaBLE model by 4.1 points on Method NDCG while using only 5,891 high-quality labeled pairs, compared to FaBLE's 40,000 synthetic augmentations.
This demonstrates that targeted, grounded labeling can be more data-efficient than massive-scale synthetic data for learning fine-grained scientific similarity. The work highlights a shift from opaque, single-score ranking to transparent, facet-aware systems that give researchers precise control over the diversity and focus of their literature recommendations.
- Models two independent facets: Background (problem) and Method (solution), allowing for controllable, diverse recommendations.
- Achieves a 49.06 NDCG@20 score on the Method facet, beating the SPECTER baseline by 31.1 points.
- Trained on only 5,891 GPT-4o-mini-labeled paper pairs, proving high-quality labels beat 40K synthetic augmentations for data efficiency.
Why It Matters
Enables researchers to find papers by specific similarity type, improving discovery of novel methods and interdisciplinary connections.