Research & Papers

UX in the Age of AI: Rethinking Evaluation Metrics Through a Statistical Lens

Legacy UX metrics fail for AI—new framework uses entropy and Bayesian confidence

Deep Dive

Traditional UX metrics like SUS, NPS, and task completion rates were built for deterministic, rule-based interfaces where identical inputs yield identical outputs. But AI-powered products—chatbots, generative interfaces, recommendation engines—produce stochastic, context-sensitive, and temporally variable results. This renders legacy metrics structurally insufficient for capturing real user experience. Harish Vijayakumar’s paper, published on arXiv, tackles this gap by introducing ADUX-Stat (Adaptive Dynamic UX Statistical Framework), which reconceptualizes usability as a probabilistic signal distribution rather than a static score.

ADUX-Stat integrates three original constructs: Interaction Entropy Index (IEI) to measure the unpredictability of AI responses from the user’s perspective; Temporal Drift Coefficient (TDC) to gauge how perceived usability changes over repeated sessions; and Bayesian Usability Confidence Score (BUCS) to output credible interval estimates of usability quality under uncertainty. The framework is validated conceptually across five established AI product categories. It provides a reproducible, field-deployable methodology for researchers and practitioners, bridging HCI, statistical modeling, and AI product evaluation.

Key Points
  • Legacy UX metrics (SUS, NPS, task completion rate) fail for AI because outputs are stochastic and context-dependent.
  • ADUX-Stat introduces Interaction Entropy Index (IEI) to quantify the unpredictability of AI responses from a user perception standpoint.
  • Bayesian Usability Confidence Score (BUCS) provides credible interval estimates of usability quality, accounting for uncertainty in AI interactions.

Why It Matters

Enables UX professionals to reliably evaluate AI products by treating usability as a probabilistic distribution, not a static score.