FiLM-conditioned ACT encoder enables phase-specific behaviors from a single policy?

FiLM-conditioned ACT encoder enables phase-specific behaviors from a single policy

Multi-modal predictor (vision, force, pose) detects contact failures invisible to cameras?

Multi-modal predictor (vision, force, pose) detects contact failures invisible to cameras

T-shirt hanging success rate jumps from 56% to 87% via autonomous error recovery?

T-shirt hanging success rate jumps from 56% to 87% via autonomous error recovery

Robotics

New phase-conditioned AI boosts robot T-shirt hanging from 56% to 87%

arXiv cs.RO May 29, 2026

⚡Robots can now learn to hang clothes with 87% success by detecting failures autonomously.

Deep Dive

Standard imitation learning policies like Action Chunking with Transformers (ACT) rely on a Markovian assumption, causing state aliasing when visually similar observations require different actions and preventing autonomous failure recovery. To address this, a team of researchers led by Dayuan Chen and Kai Tang introduces a phase-conditioned, force-aware framework that uses a closed-loop hierarchical architecture.

A FiLM-conditioned ACT encoder modulates feature extraction based on the current task phase, allowing a single unified policy to produce phase-specific behaviors while sharing action dynamics. A multi-modal phase predictor fusing visual, force, and pose feedback estimates the phase in real time, detecting contact failures invisible to vision alone and triggering autonomous recovery. A hybrid impedance controller enables compliant execution, paired with a haptic teleoperation interface for force-aware data collection. Ablation studies show FiLM-based modulation significantly outperforms baselines, and t-SNE analysis confirms well-separated phase-specific features. Validated on hanging and removing a T-shirt with dual arms, the closed-loop system improved hanging success from 56% to 87% through autonomous error recovery. Code and videos are available.

Key Points

FiLM-conditioned ACT encoder enables phase-specific behaviors from a single policy
Multi-modal predictor (vision, force, pose) detects contact failures invisible to cameras
T-shirt hanging success rate jumps from 56% to 87% via autonomous error recovery

Why It Matters

Robust deformable object manipulation brings household robots closer to handling tasks like laundry and assembly autonomously.

Read Original Article

New phase-conditioned AI boosts robot T-shirt hanging from 56% to 87%

Why It Matters

Related Articles

🚀 Stay Ahead in AI