BeLink boosts biomedical entity linking accuracy by up to 24%
Instruction-tuned open-source LLMs re-rank candidates faster and more accurately, says new SIGIR 2026 paper.
Biomedical Entity Linking (BEL)—mapping ambiguous mentions in text to standard biomedical concepts—has struggled with the computational cost of large language models (LLMs). In a new paper accepted to ACM SIGIR 2026, researchers Darya Shlyk, Stefano Montanelli, and Lawrence Hunter tackle this bottleneck by applying instruction-tuning to open-source generative models exclusively at the re-ranking stage of the BEL pipeline. They propose a novel set-wise instruction-tuning formulation that allows the model to compare multiple candidate entities simultaneously, enabling fast and accurate candidate selection without the overhead of running the LLM over the entire pipeline.
When tested on several BEL benchmarks, BeLink achieved 3% to 24% higher linking accuracy than prior state-of-the-art approaches, while also reducing overall inference time. The system is modular and end-to-end, designed for practical deployment in clinical and research settings where speed and accuracy are critical. By keeping the heavy generative model only in the re-ranking step, BeLink demonstrates that open-source LLMs can deliver competitive performance without the massive computational resources required by full-pipeline approaches. This work opens the door to more efficient biomedical information extraction using instruction-tuned models.
- Set-wise instruction-tuning formulation enables fast, multi-candidate re-ranking
- Accuracy improvements of 3% to 24% over state-of-the-art BEL methods
- Reduced inference time makes LLM-based BEL feasible for real-world deployment
Why It Matters
Makes LLM-powered biomedical entity linking efficient enough for real-world clinical and research applications.