Study shows LLMs can extract usability insights from app reviews
New research demonstrates LLMs can analyze 300 app reviews with human-comparable accuracy
Deep Dive
A new study provides a dataset of 300 user reviews labeled by two human raters and an LLM, and finds that LLMs can generally recognize usability as a non-functional requirement based on their F-score, though performance and reliability strongly depend on the prompt. Using prompt engineering derived from Nielsen’s heuristics, the workflow presents a quicker, cheaper alternative to traditional ML approaches for processing user requirements.
Key Points
- LLMs achieved comparable F-scores to human raters in identifying usability issues from 300 app reviews
- Researchers built a dataset using Nielsen's 10 Usability Heuristics for prompt engineering
- Prompt design significantly impacts LLM performance in this task
Why It Matters
LLMs could revolutionize product development by automating usability feedback analysis at scale