Robotics

TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies

Flow-matching policies fail under perturbations, but TAIL-Safe's Q-function steers them back.

Deep Dive

Imitation learning (IL) policies like flow-matching and diffusion models excel at complex manipulation but are notoriously brittle: even within their training distribution, they fail due to sensitivity to initial conditions and compounding drift. This makes real-world deployment unsafe when out-of-distribution scenarios arise. TAIL-Safe, developed by Ahmed and Begum, introduces a principled safety monitor that defines a 'safe set' of state-action pairs where the policy empirically succeeds. The system learns a Lipschitz-continuous Q-value function that maps each pair to a long-term safety score based on three task-agnostic criteria: whether the object is visible, recognizable, and graspable. The zero-superlevel set of this function defines an invariant safe region.

When the policy proposes an action outside this safe set, TAIL-Safe triggers a recovery mechanism inspired by Nagumo's theorem: it applies gradient ascent on the Q-function to pull the policy back to safety. To train the Q-function without risking real hardware, the authors build a high-fidelity digital twin using Gaussian Splatting, enabling systematic failure data collection. Experiments with a Franka Emika robot show that flow-matching policies, which normally fail under runtime perturbations, achieve consistent task success with TAIL-Safe. This approach is task-agnostic, meaning it works across different learned tasks without retraining the safety monitor.

Key Points
  • TAIL-Safe uses a Lipschitz-continuous Q-function scoring three task-agnostic criteria: visibility, recognizability, and graspability.
  • Recovery mechanism based on Nagumo's theorem uses gradient ascent on Q-function to steer policy back to safety.
  • Gaussian Splatting digital twin enables systematic failure data collection without physical robot risk.

Why It Matters

Enables safer deployment of imitation learning robots in real-world settings where distribution shift is common.