SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models
Researchers' SABER framework reduces robot task success by 20.6% with tiny text edits to instructions.
A research team from the University of Maryland, Meta, and other institutions has developed SABER (Stealthy Agentic Black-Box Attack Framework), a novel method for exposing vulnerabilities in Vision-Language-Action (VLA) models that power modern robots. These models, which enable robots to follow natural-language instructions based on visual observations, have a critical weakness: small textual perturbations in instructions can dramatically alter downstream robot behavior. SABER uses a GRPO-trained ReAct attacker—an AI agent that reasons and acts—to generate minimal, plausible edits to instructions using character-, token-, and prompt-level tools under strict edit budgets.
On the LIBERO benchmark across six state-of-the-art VLA models, SABER proved remarkably effective. It reduced task success rates by 20.6%, increased action-sequence length by 55%, and raised constraint violations by 33%. Crucially, it achieved these results while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines, making it both efficient and stealthy. The framework induces targeted behavioral degradation including task failure, unnecessarily long execution, and increased safety violations.
The research demonstrates that current robotic foundation models are surprisingly vulnerable to subtle adversarial attacks through their instruction channels. Unlike traditional white-box attacks that require full model access, SABER operates as a black-box attacker, making it practical for real-world security testing. This agentic approach offers a scalable method for red-teaming—systematically testing for vulnerabilities—in robotic AI systems before deployment.
- SABER reduces robot task success by 20.6% with minimal text edits to instructions
- Increases action-sequence length by 55% and constraint violations by 33% across six VLA models
- Uses 21.1% fewer tool calls and 54.7% fewer character edits than GPT-based attack methods
Why It Matters
Exposes critical security flaws in robot AI that could be exploited through simple text manipulation, requiring stronger robustness testing.