Research & Papers

Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

arXiv cs.AI May 07, 2026

⚡Tracks your every step with 2.29x better timing accuracy than baselines

Deep Dive

Pro²Assist addresses a key gap in current personal assistants: they react to queries but can't proactively guide users through multi-step tasks like assembling furniture or cooking complex recipes. Built by researchers (affiliations not fully specified), the system integrates multimodal egocentric perception from AR glasses—capturing video, motion, and spatial data—to extract step-oriented procedural context. It leverages multimodal large language models (MLLMs) to reason over fine-grained task progress and user state, then displays timely AR overlays without waiting for explicit commands.

Evaluated on both public datasets and real-world AR glasses testbeds, Pro²Assist achieved over 21% higher procedural action understanding accuracy than the best baselines and up to 2.29x better timing accuracy for proactive interventions. In a user study with 20 participants, 90% rated it useful for real-world assistance. This work points to a future where AI assistants don't just answer questions but anticipate needs and guide users through complex workflows in real time.

Key Points

Pro²Assist uses AR glasses' multimodal data (video, motion) to track step-by-step task progress
Outperforms baselines by 21% in action understanding accuracy and up to 2.29x in proactive timing
90% of 20 study participants found it useful for real-world procedural assistance

Why It Matters

Proactive AI guidance for multi-step tasks could transform AR-assisted workflows and reduce errors.

Read Original Article

Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

Why It Matters

Stay Ahead in AI