New phase-conditioned AI boosts robot T-shirt hanging from 56% to 87%
Robots can now learn to hang clothes with 87% success by detecting failures autonomously.
Standard imitation learning policies like Action Chunking with Transformers (ACT) rely on a Markovian assumption, causing state aliasing when visually similar observations require different actions and preventing autonomous failure recovery. To address this, a team of researchers led by Dayuan Chen and Kai Tang introduces a phase-conditioned, force-aware framework that uses a closed-loop hierarchical architecture.
A FiLM-conditioned ACT encoder modulates feature extraction based on the current task phase, allowing a single unified policy to produce phase-specific behaviors while sharing action dynamics. A multi-modal phase predictor fusing visual, force, and pose feedback estimates the phase in real time, detecting contact failures invisible to vision alone and triggering autonomous recovery. A hybrid impedance controller enables compliant execution, paired with a haptic teleoperation interface for force-aware data collection. Ablation studies show FiLM-based modulation significantly outperforms baselines, and t-SNE analysis confirms well-separated phase-specific features. Validated on hanging and removing a T-shirt with dual arms, the closed-loop system improved hanging success from 56% to 87% through autonomous error recovery. Code and videos are available.
- FiLM-conditioned ACT encoder enables phase-specific behaviors from a single policy
- Multi-modal predictor (vision, force, pose) detects contact failures invisible to cameras
- T-shirt hanging success rate jumps from 56% to 87% via autonomous error recovery
Why It Matters
Robust deformable object manipulation brings household robots closer to handling tasks like laundry and assembly autonomously.