BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels
Researchers' BioHiCL models leverage hierarchical MeSH annotations to improve semantic understanding in biomedical search.
A team of researchers has introduced BioHiCL (Biomedical Retrieval with Hierarchical Multi-Label Contrastive Learning), a novel AI framework designed to significantly improve biomedical information retrieval. Developed by Mengfei Lan, Lecheng Zheng, and Halil Kilicoglu, the system addresses a key limitation in existing biomedical generative retrievers: their reliance on coarse binary relevance signals, which often fail to capture nuanced semantic overlap between texts. BioHiCL instead leverages the structured, hierarchical annotations provided by MeSH (Medical Subject Headings) to train more semantically aware models through multi-label contrastive learning.
The researchers released two model variants: BioHiCL-Base with 0.1 billion parameters and BioHiCL-Large with 0.3 billion parameters. These compact yet powerful models demonstrate promising performance across several critical tasks, including biomedical document retrieval, sentence similarity assessment, and question answering. A key advantage is their computational efficiency, making them suitable for practical deployment where resource constraints are a concern. The paper detailing this work has been accepted for presentation at the ACL 2026 Main Conference, highlighting its significance in the natural language processing and information retrieval communities.
By moving beyond simple binary relevance, BioHiCL's approach allows AI systems to understand the complex, multi-faceted relationships inherent in biomedical literature. This hierarchical understanding is crucial for applications like literature review, clinical decision support, and drug discovery, where precision and context are paramount. The use of MeSH labels provides a rich, standardized source of supervision that guides the model to learn more accurate representations of biomedical concepts and their interconnections.
- Leverages hierarchical MeSH annotations for structured supervision in multi-label contrastive learning
- Introduces two efficient models: BioHiCL-Base (0.1B parameters) and BioHiCL-Large (0.3B parameters)
- Improves performance on biomedical retrieval, sentence similarity, and QA tasks while being deployable
Why It Matters
Enables more precise, context-aware search and analysis of complex biomedical literature for researchers and clinicians.