Robotics

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

A new framework infers symbolic task structure from robot videos, boosting performance in low-data regimes.

Deep Dive

A team from Carnegie Mellon University has published a paper on arXiv introducing ENAP (Emergent Neural Automaton Policy), a novel framework designed to tackle the long-standing challenge of scaling robot learning to complex, long-horizon tasks. The core innovation lies in its ability to automatically infer a symbolic, interpretable task structure—modeled as a Mealy state machine—directly from raw visuomotor demonstration data. This is achieved through a combination of adaptive clustering and an extension of the classic L* learning algorithm. The resulting high-level symbolic planner captures latent task modes and transitions, providing a clear blueprint for task execution that pure neural network policies often lack.

This inferred discrete structure then acts as a guide for a low-level reactive residual network, which learns precise continuous control commands via behavior cloning. By explicitly separating the problem into discrete symbolic reasoning and continuous control, ENAP achieves significant gains in sample efficiency and interpretability. In extensive experiments on complex manipulation and long-horizon tasks, ENAP demonstrated a performance improvement of up to 27% over state-of-the-art end-to-end visuomotor policies, particularly in low-data regimes. Crucially, it accomplishes this without requiring any pre-defined, hand-crafted symbolic rules or task-specific labels, allowing the useful structure to emerge directly from data.

Key Points
  • ENAP infers a symbolic Mealy state machine from visual demonstrations using adaptive clustering and the L* algorithm.
  • The framework outperforms leading end-to-end visuomotor policies by up to 27% on complex tasks, especially with limited data.
  • It provides an interpretable representation of robotic intent and task structure without needing hand-crafted symbolic priors.

Why It Matters

This approach could drastically reduce the data and engineering needed to train robots for intricate, multi-step real-world tasks.