Robotics

Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation

The adaptive model learns to ignore torque data when not in contact, boosting success rates.

Deep Dive

A research team led by Jiuzhou Lei and Chang Liu has published a significant comparison study on integrating vision and force/torque (F/T) sensing for robotic manipulation. While vision-based policies are common, they falter in contact-rich tasks where tactile feedback is crucial. The paper systematically compares existing integration strategies—like auxiliary prediction and mixture-of-experts—within diffusion-based manipulation policies, providing a needed benchmark for the field.

Their key contribution is a novel adaptive integration strategy. Instead of constantly fusing both data streams, their model learns to ignore F/T signals during non-contact phases and only leverages the combined vision and torque information when contact occurs. This 'contact-aware gating' mimics a more natural, efficient use of sensory input. Experimental results show this method delivers a 14% higher success rate than the best existing baseline, proving that *when* to fuse data is as important as *how* to fuse it for robust robotic control.

Key Points
  • Proposes an adaptive strategy that fuses vision and torque data only during contact phases, ignoring torque when not needed.
  • Outperforms the strongest existing baseline by 14% in task success rate in manipulation experiments.
  • Provides a comprehensive comparison study of different Force/Torque-vision integration strategies within diffusion-based policies.

Why It Matters

Enables more dexterous and reliable robots for precise assembly, healthcare, and other contact-sensitive real-world applications.