Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
Tracks your every step with 2.29x better timing accuracy than baselines
Pro²Assist addresses a key gap in current personal assistants: they react to queries but can't proactively guide users through multi-step tasks like assembling furniture or cooking complex recipes. Built by researchers (affiliations not fully specified), the system integrates multimodal egocentric perception from AR glasses—capturing video, motion, and spatial data—to extract step-oriented procedural context. It leverages multimodal large language models (MLLMs) to reason over fine-grained task progress and user state, then displays timely AR overlays without waiting for explicit commands.
Evaluated on both public datasets and real-world AR glasses testbeds, Pro²Assist achieved over 21% higher procedural action understanding accuracy than the best baselines and up to 2.29x better timing accuracy for proactive interventions. In a user study with 20 participants, 90% rated it useful for real-world assistance. This work points to a future where AI assistants don't just answer questions but anticipate needs and guide users through complex workflows in real time.
- Pro²Assist uses AR glasses' multimodal data (video, motion) to track step-by-step task progress
- Outperforms baselines by 21% in action understanding accuracy and up to 2.29x in proactive timing
- 90% of 20 study participants found it useful for real-world procedural assistance
Why It Matters
Proactive AI guidance for multi-step tasks could transform AR-assisted workflows and reduce errors.