Research & Papers

AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

New mobile GUI agent solves the transparency vs. multitasking trade-off with three adaptive visual modes.

Deep Dive

A research team from Seoul National University and KAIST has introduced AgentLens, a novel mobile GUI agent designed to solve a core usability problem in human-AI interaction. Current mobile agents operate in two flawed modes: foreground execution, which blocks the screen, or background execution, which offers no visual feedback. Through formative studies, the team found users wanted a hybrid, just-in-time approach. AgentLens addresses this by dynamically choosing between three visual communication modalities—Full UI (complete overlay), Partial UI (minimal indicators), and GenUI (generated visual summaries)—based on the task context. This is enabled by a 'Virtual Display' system that allows the agent to run in the background while selectively projecting visual overlays.

In a controlled user study with 21 participants, AgentLens demonstrated significant advantages. It was the preferred system for 85.7% of users, achieved the highest System Usability Scale score (1.94 on the PSSUQ), and scored 6.43 out of 7 for adoption intent. The adaptive system allows users to maintain awareness of an agent's progress on tasks like booking a flight or ordering food, without being forced to watch every step or lose the ability to multitask. This research, published on arXiv, represents a critical step toward making practical AI assistants that are both powerful and unobtrusive, moving beyond pure automation to consider the human in the loop.

Key Points
  • Solves the transparency-multitasking trade-off by adaptively using three visual modes: Full UI, Partial UI, and GenUI.
  • Achieved 85.7% user preference and a top-tier usability score (1.94 PSSUQ) in a study with 21 participants.
  • Uses a 'Virtual Display' to enable background execution with selective visual overlays, preserving user workflow.

Why It Matters

It makes AI mobile assistants practically usable by letting users monitor tasks without losing screen control, a key hurdle for adoption.