LoRA-MME: Multi-Model Ensemble of LoRA-Tuned Encoders for Code Comment Classification
Researchers combine four AI models using LoRA fine-tuning to classify code comments across Java, Python, and Pharo.
A research team led by Md Akib Haider has introduced LoRA-MME, a novel multi-model ensemble architecture designed for the complex task of code comment classification across multiple programming languages. Developed for the NLBSE'26 Tool Competition, this approach addresses the multi-label classification challenge for Java, Python, and Pharo by leveraging four distinct transformer encoders: UniXcoder, CodeBERT, GraphCodeBERT, and CodeBERTa. The system employs Parameter-Efficient Fine-Tuning (PEFT) via Low-Rank Adaptation (LoRA), allowing for effective model specialization without the prohibitive memory requirements of full fine-tuning, making advanced code analysis more accessible to researchers and developers.
The technical implementation involves independently fine-tuning each encoder using LoRA, then aggregating their predictions through a learned weighted ensemble strategy to maximize classification performance. This ensemble achieved impressive metrics with a weighted F1 score of 0.7906 and a macro F1 of 0.6867 on the test set, demonstrating strong semantic accuracy. However, the computational overhead of running four models simultaneously resulted in a final competition score of just 41.20%, highlighting the critical trade-off between accuracy and inference efficiency that remains a central challenge in production AI systems. This research provides a valuable framework for future work balancing model performance with practical deployment constraints in software engineering tools.
- Combines four transformer models (UniXcoder, CodeBERT, GraphCodeBERT, CodeBERTa) using LoRA fine-tuning
- Achieved 0.7906 weighted F1 score and 0.6867 macro F1 on multi-language test data
- Highlights accuracy/efficiency trade-off with 41.20% competition score due to computational cost
Why It Matters
Enables automated software documentation analysis across languages while demonstrating practical limits of ensemble AI approaches.