Evaluating LLM-Based Goal Extraction in Requirements Engineering: Prompting Strategies and Their Limitations
Researchers find AI can accelerate but not replace manual goal extraction.
A team of researchers from Politecnico di Torino published a paper on arXiv evaluating LLM-based goal extraction in requirements engineering. They proposed a pipeline that automates the Goal-Oriented Requirements Engineering (GORE) process by extracting functional goals from software documentation through three phases: actor identification, high-level goal extraction, and low-level goal extraction. The pipeline uses a chain of LLMs with engineered prompts, experimenting with different in-context learning variants and a generation-critic feedback loop involving two LLMs.
The pipeline achieved 61% accuracy in low-level goal identification, the final stage. The feedback-loop mechanism with Zero-shot outperformed standalone Few-shot, but combining feedback with Few-shot showed no advantage, suggesting the primary performance ceiling is the prompting strategy applied to the critic LLM. The authors conclude the approach is best suited as a tool to accelerate manual extraction rather than as a full replacement. Future work will integrate RAG and Chain-of-Thought prompting to improve accuracy.
- The LLM pipeline achieved 61% accuracy in low-level goal identification.
- Zero-shot with feedback outperformed standalone Few-shot, but Few-shot with feedback showed no improvement.
- The approach is best for accelerating manual extraction, not replacing it.
Why It Matters
LLMs can speed up requirements engineering but still need human oversight for reliable results.