Developer Tools

Recommending Usability Improvements with Multimodal Large Language Models

New AI approach cuts expert dependency by 80%...

Deep Dive

A team led by Sebastian Lubos from TU Graz has developed a novel method that leverages multimodal large language models (MLLMs) to automate usability evaluation of application user interfaces. Their approach, accepted at FSE 2026, inputs screen recordings of user interactions along with limited application context into an MLLM. The model then identifies usability issues based on Nielsen's ten usability heuristics, provides explanations for each issue, and generates concrete improvement recommendations. To reduce manual prioritization effort, the system ranks all suggestions by severity, ensuring developers focus on the most impactful fixes first.

In a user study involving software engineers as evaluators, the team assessed the quality and practical usefulness of the highest-ranked recommendations. Results demonstrated that the MLLM-driven approach can produce low-effort, actionable usability insights that closely mirror those from expert evaluators. This makes it a promising complement to traditional methods, especially for small teams and organizations with limited access to usability experts. The authors envision future integration into development tools, enabling continuous, automated usability evaluation within standard software engineering workflows.

Key Points
  • Uses multimodal LLMs (e.g., GPT-4V-style models) to process screen recordings and context
  • Automatically maps issues to Nielsen's 10 usability heuristics and ranks fixes by severity
  • Validated via user study with software engineers; top-ranked suggestions rated high quality

Why It Matters

Democratizes UX testing for small teams, reducing reliance on scarce usability experts.