Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation
The adaptive model learns to ignore torque data when not in contact, boosting success rates.
A research team led by Jiuzhou Lei and Chang Liu has published a significant comparison study on integrating vision and force/torque (F/T) sensing for robotic manipulation. While vision-based policies are common, they falter in contact-rich tasks where tactile feedback is crucial. The paper systematically compares existing integration strategies—like auxiliary prediction and mixture-of-experts—within diffusion-based manipulation policies, providing a needed benchmark for the field.
Their key contribution is a novel adaptive integration strategy. Instead of constantly fusing both data streams, their model learns to ignore F/T signals during non-contact phases and only leverages the combined vision and torque information when contact occurs. This 'contact-aware gating' mimics a more natural, efficient use of sensory input. Experimental results show this method delivers a 14% higher success rate than the best existing baseline, proving that *when* to fuse data is as important as *how* to fuse it for robust robotic control.
- Proposes an adaptive strategy that fuses vision and torque data only during contact phases, ignoring torque when not needed.
- Outperforms the strongest existing baseline by 14% in task success rate in manipulation experiments.
- Provides a comprehensive comparison study of different Force/Torque-vision integration strategies within diffusion-based policies.
Why It Matters
Enables more dexterous and reliable robots for precise assembly, healthcare, and other contact-sensitive real-world applications.