VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis
New method guarantees safety for robots using only camera input, tested on hardware.
VISION-SLS is a new method for safe perception-based control of robots using only high-resolution RGB images. Developed by researchers at ETH Zurich and other institutions, it addresses the challenge of ensuring safety guarantees when robots operate under partial observability, sensor noise, and nonlinear dynamics. The approach combines a learned low-dimensional observation map from pretrained visual features with state-dependent error bounds, and a causal affine time-varying output-feedback policy optimized via System Level Synthesis (SLS). A key innovation is a scalable solver that uses sequential convex programming with efficient Riccati recursions.
In experiments, VISION-SLS was tested on three simulated visuomotor tasks: a 4D car, a 10D quadrotor (both with >= 512x512 pixel images), and a 59D humanoid with partial observability. The method enabled safe, information-gathering behavior that actively reduces uncertainty while guaranteeing constraint satisfaction. Crucially, it was also validated on real hardware, safely controlling a ground vehicle using only onboard camera images, outperforming baseline methods in safety rate and solve times. These results demonstrate that SLS-based safe visuomotor control can be practical at scale.
- Uses only high-resolution RGB images (>=512x512 pixels) for control, no other sensors needed
- Provides robust safety guarantees with calibrated uncertainty bounds despite noise and nonlinear dynamics
- Validated on real hardware (ground vehicle) and outperformed baselines in safety rate and solve times
Why It Matters
Makes safe robot control from cameras practical, enabling cheaper, more reliable autonomous systems.