Barak Or's Physical Admissibility Gate Reaches 98% AUC for Robot Predictions
Low RMSE doesn't guarantee robot motions are physically executable—new gate catches 89% of invalid proposals.
A new paper by Barak Or tackles a critical blind spot in robotics AI: predictive models that output state rollouts or action chunks often look accurate on RMSE but generate physically impossible motions. The author proposes a formal "physical admissibility" interface that treats decoded proposals as candidate dynamics and evaluates them against kinematic, dynamic, and direct-to-composed horizon conditions before execution. Passing doesn't guarantee task success, but rejection gives a component-level reason for why the proposal violates the physical envelope. The method is tested on the Hugging Face LeRobot PushT dataset, a standard benchmark for robotic manipulation.
Results show the power of this filtering approach: one-step prediction-RMSE and standardized dynamics residuals reach AUC scores of 0.982 and 0.972, while kinematic-only conditions lag at 0.592. The full combined gate achieves AUC 0.957 with per-condition attribution. In replay-based intervention experiments, the residual-based filters and full gate prevent 87–89% of invalid proposals while maintaining mean progress near 0.998—meaning the robot still makes forward progress in the task. This work provides a practical, component-level feasibility checker that could be integrated into any predictive physical AI system, reducing waste and risk in real-world deployment.
- Proposes physical admissibility gate with kinematic, dynamic, and horizon conditions; tested on Hugging Face LeRobot PushT dataset.
- One-step prediction-RMSE and dynamics residuals achieve AUC 0.982 and 0.972; full gate reaches AUC 0.957 with per-condition attribution.
- Replay experiments show filter prevents 87–89% of invalid proposals while preserving mean task progress at 0.998.
Why It Matters
Ensures robot predictions are physically executable, reducing costly failures in real-world deployment of AI-driven automation.