Developer Tools

Study shows LLMs can extract usability insights from app reviews

New research demonstrates LLMs can analyze 300 app reviews with human-comparable accuracy

Deep Dive

A new study provides a dataset of 300 user reviews labeled by two human raters and an LLM, and finds that LLMs can generally recognize usability as a non-functional requirement based on their F-score, though performance and reliability strongly depend on the prompt. Using prompt engineering derived from Nielsen’s heuristics, the workflow presents a quicker, cheaper alternative to traditional ML approaches for processing user requirements.

Key Points
  • LLMs achieved comparable F-scores to human raters in identifying usability issues from 300 app reviews
  • Researchers built a dataset using Nielsen's 10 Usability Heuristics for prompt engineering
  • Prompt design significantly impacts LLM performance in this task

Why It Matters

LLMs could revolutionize product development by automating usability feedback analysis at scale