Robotics

Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation

arXiv cs.RO April 03, 2026

⚡The adaptive model learns to ignore torque data when not in contact, boosting success rates.

Deep Dive

A research team led by Jiuzhou Lei and Chang Liu has published a significant comparison study on integrating vision and force/torque (F/T) sensing for robotic manipulation. While vision-based policies are common, they falter in contact-rich tasks where tactile feedback is crucial. The paper systematically compares existing integration strategies—like auxiliary prediction and mixture-of-experts—within diffusion-based manipulation policies, providing a needed benchmark for the field.

Their key contribution is a novel adaptive integration strategy. Instead of constantly fusing both data streams, their model learns to ignore F/T signals during non-contact phases and only leverages the combined vision and torque information when contact occurs. This 'contact-aware gating' mimics a more natural, efficient use of sensory input. Experimental results show this method delivers a 14% higher success rate than the best existing baseline, proving that *when* to fuse data is as important as *how* to fuse it for robust robotic control.

Key Points

Proposes an adaptive strategy that fuses vision and torque data only during contact phases, ignoring torque when not needed.
Outperforms the strongest existing baseline by 14% in task success rate in manipulation experiments.
Provides a comprehensive comparison study of different Force/Torque-vision integration strategies within diffusion-based policies.

Why It Matters

Enables more dexterous and reliable robots for precise assembly, healthcare, and other contact-sensitive real-world applications.

Read Original Article

Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation

Why It Matters

Stay Ahead in AI