Robotics

SELF-VLA: A Skill Enhanced Agentic Vision-Language-Action Framework for Contact-Rich Disassembly

arXiv cs.RO March 13, 2026

⚡New agentic VLA model integrates explicit skills to handle contact-rich, long-horizon industrial tasks that stumped previous AI.

Deep Dive

A team of researchers including Chang Liu, Sibo Tian, Xiao Liang, and Minghui Zheng has introduced SELF-VLA, a novel agentic Vision-Language-Action framework designed to tackle one of robotics' toughest challenges: automated disassembly of end-of-life electronics. Current robotic systems struggle with the variability, contact-rich interactions, and long sequences required to dismantle products like smartphones or laptops, often remaining task-specific and reliant on human labor. SELF-VLA bridges this gap by integrating explicit, pre-defined disassembly skills (like "unscrew" or "pry") into a flexible VLA model, creating an agent that can reason about and execute complex, multi-step disassembly procedures.

Experimental results show SELF-VLA significantly outperforms state-of-the-art end-to-end VLA models on two benchmark contact-rich disassembly tasks. The key innovation is its hybrid approach; instead of trying to learn everything from scratch, the framework uses the VLA model as a high-level planner and reasoner, calling upon a library of robust, lower-level skills for precise physical interactions. This makes the system more adaptable to the uncertainties of worn or damaged products and reduces the massive data preparation typically needed for industrial automation.

The research, detailed in the arXiv paper 2603.11080, represents a major step toward practical robotic recycling and remanufacturing. By successfully automating sequential, precise manipulation in unstructured environments, SELF-VLA demonstrates a viable path to reducing e-waste and recovering valuable components without intensive human intervention, marking a shift from rigid, pre-programmed robots to more general and capable agentic systems.

Key Points

Integrates explicit disassembly skills into a Vision-Language-Action (VLA) model, creating a hybrid agentic framework for complex tasks.
Significantly outperforms existing end-to-end VLA models on contact-rich disassembly benchmarks, handling variability in end-of-life products.
Reduces need for extensive task-specific data and training, moving toward generalizable automation for electronics recycling and remanufacturing.

Why It Matters

Enables automated recycling of e-waste at scale, reducing reliance on manual labor and improving recovery of critical materials.

Read Original Article

SELF-VLA: A Skill Enhanced Agentic Vision-Language-Action Framework for Contact-Rich Disassembly

Why It Matters

Stay Ahead in AI