Robotics

FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

New model processes long video streams in real-time, achieving 99.2% success on robot tasks.

Deep Dive

Researchers from Tsinghua University and others developed FUTURE-VLA, a unified vision-language-action model for robots. It uses temporal compression and latent-space autoregression to process extensive multi-view histories while maintaining constant inference latency. The system achieved 99.2% success on LIBERO benchmarks and extended spatiotemporal windows 16x over baselines, enabling real-time future forecasting and human-in-the-loop validation through interactive execution gating.

Why It Matters

Enables safer, more predictable autonomous robots that can preview actions for human approval before execution.