Agent Frameworks

Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents

arXiv cs.MA April 29, 2026

⚡Researchers can now attribute VLN agent failures to specific capability deficiencies...

Deep Dive

Researchers from the Chinese Academy of Sciences and Singapore Management University have developed a new testing method for Vision-Language Navigation (VLN) agents that can pinpoint exactly which capability—perception, memory, planning, or decision-making—caused a task failure. Published on arXiv (2604.25161) and submitted to ACL 2026, the approach addresses a critical gap in embodied AI testing: existing methods treat agents as black boxes, making it nearly impossible to localize failures to specific capability deficiencies.

The method works in three stages: adaptive test case generation via seed selection and mutation, capability oracles that identify capability-specific errors, and a feedback mechanism that attributes failures to capabilities and guides further test generation. In experiments, the approach discovered more failure cases and more accurately identified capability-level deficiencies compared to state-of-the-art baselines. This provides developers with interpretable, actionable insights for improving embodied agents in safety-critical applications like robotics and autonomous navigation.

Key Points

Method attributes failures to specific capabilities: perception, memory, planning, or decision-making
Uses adaptive test case generation via seed selection and mutation to discover more failure cases
Outperforms state-of-the-art baselines in accuracy of failure attribution and interpretability

Why It Matters

Makes debugging embodied AI agents more precise, accelerating development of safer autonomous systems.

Read Original Article

Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents

Why It Matters

Stay Ahead in AI