Robotics

VLAMotor boosts VLA robot success rate by 57.5% using failure data

New framework turns edge-case failures into training data for VLA models

Deep Dive

Vision-Language-Action (VLA) models, critical for robotic manipulation, are data-driven and often fail on edge-case configurations not covered in training. Existing work stops at failure detection, lacking a repair mechanism. VLAMotor, introduced by Zeqin Liao and colleagues, is the first analysis framework that both exposes failures and converts them into supervisory data. It uses distance-aware testing to estimate input uncertainty and rank failures, then eliminates redundancy to create compact test sets that uncover diverse failures.

VLAMotor abstracts failure trajectories into structured semantic representations and plans parameterized repair-skill sequences, executed via inverse kinematics and motion control. The resulting successful trajectories are automatically labeled and used to fine-tune the original VLA model. Evaluated on four robotic manipulation tasks, VLAMotor triggers failures in 92.33% of test cases and improves test coverage by 18.93% over state-of-the-art tools. Fine-tuning on synthetic data boosts overall simulation success rate by 49.25%, and when deployed on real hardware, models enhanced in simulation achieve a 57.50% higher success rate than the original.

Key Points
  • VLAMotor is the first framework to both expose failures in VLA models and use them for automated model repair via agent-based data synthesis.
  • It achieves 92.33% failure detection accuracy and improves test coverage by 18.93% over existing tools.
  • Fine-tuning with synthetic data from failures improves real-world robotic manipulation success rate by 57.5%.

Why It Matters

Turns VLA model failures into low-cost training data for robust robot manipulation.