Enhancing Requirements Traceability Link Recovery: A Novel Approach with T-SimCSE
A new AI model automates linking software requirements to code without needing labeled training data.
A team of researchers has introduced T-SimCSE, a novel AI-powered approach designed to automatically recover traceability links between software requirements and other artifacts like code or test cases. This process, crucial for maintaining software quality and managing changes, is notoriously difficult and often done manually. The new method leverages the SimCSE pre-trained language model to calculate semantic similarity between a requirement and potential target artifacts. Its key innovation is using a new metric called 'specificity' to reorder these artifacts, ultimately creating links between a requirement and the top-K most relevant targets.
T-SimCSE addresses two major limitations of previous AI methods: insufficient accuracy and a heavy reliance on large, labeled datasets for training, which are rarely available in software engineering. By building on SimCSE, which excels without labeled data, the approach is both more applicable and more effective. The team rigorously evaluated T-SimCSE against other approaches across ten public datasets. The results demonstrated that it achieves superior performance, particularly in terms of recall (finding all relevant links) and Mean Average Precision (MAP), a measure of ranking quality.
- Based on the SimCSE pre-trained language model, requiring no labeled data for training.
- Introduces a new 'specificity' metric to improve the ranking of related software artifacts.
- Outperformed other methods on ten datasets, showing superior recall and Mean Average Precision (MAP).
Why It Matters
Automates a critical, tedious software engineering task, improving accuracy and reducing manual effort for developers.