Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents
Researchers can now attribute VLN agent failures to specific capability deficiencies...
Researchers from the Chinese Academy of Sciences and Singapore Management University have developed a new testing method for Vision-Language Navigation (VLN) agents that can pinpoint exactly which capability—perception, memory, planning, or decision-making—caused a task failure. Published on arXiv (2604.25161) and submitted to ACL 2026, the approach addresses a critical gap in embodied AI testing: existing methods treat agents as black boxes, making it nearly impossible to localize failures to specific capability deficiencies.
The method works in three stages: adaptive test case generation via seed selection and mutation, capability oracles that identify capability-specific errors, and a feedback mechanism that attributes failures to capabilities and guides further test generation. In experiments, the approach discovered more failure cases and more accurately identified capability-level deficiencies compared to state-of-the-art baselines. This provides developers with interpretable, actionable insights for improving embodied agents in safety-critical applications like robotics and autonomous navigation.
- Method attributes failures to specific capabilities: perception, memory, planning, or decision-making
- Uses adaptive test case generation via seed selection and mutation to discover more failure cases
- Outperforms state-of-the-art baselines in accuracy of failure attribution and interpretability
Why It Matters
Makes debugging embodied AI agents more precise, accelerating development of safer autonomous systems.