Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
A new method generates thousands of natural language instructions to find hidden failure modes in robot AI.
A research team from institutions including UC Berkeley and the University of Washington has introduced Q-DIG, a novel framework for stress-testing the robustness of Vision-Language-Action (VLA) models that power general-purpose robots. VLAs, which translate visual and language inputs into physical actions, are notoriously brittle—their performance can collapse with slight changes in instruction wording. Q-DIG addresses this by automating the discovery of diverse, natural language prompts that cause the robot to fail, using a technique called Quality Diversity (QD) to systematically explore a wide range of semantically valid but potentially problematic instructions.
In testing across multiple simulation benchmarks, Q-DIG outperformed baseline methods by generating a broader spectrum of adversarial prompts that were judged more natural and human-like in a user study. Crucially, these failure-inducing prompts aren't just for identification; they become training data. By fine-tuning the VLA models on the failures Q-DIG uncovers, the researchers demonstrated significant improvements in the robot's success rates when faced with new, unseen instructions. Real-world hardware evaluations confirmed the findings from simulation, showing the method's practical utility for creating more reliable robotic systems that can handle the unpredictable nature of human language.
- Q-DIG uses Quality Diversity algorithms with VLMs to scalably generate thousands of diverse, task-relevant instructions that break robot AI.
- In tests, it found more meaningful failure modes than baselines, and its prompts were rated as more natural in a user study.
- Fine-tuning VLA models on the failure data from Q-DIG improved real-world task success rates on novel instructions.
Why It Matters
This is a crucial step toward deploying safe, reliable robots that won't fail unpredictably when given slightly different commands by humans.