Robotics

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

arXiv cs.RO February 25, 2026

⚡New AI model for robots uses two memory systems to achieve 2.9x faster inference and major performance gains.

Deep Dive

A research team led by Zaijing Li has introduced OptimusVLA, a novel dual-memory architecture designed to overcome critical bottlenecks in robotic manipulation AI. Current Vision-Language-Action (VLA) models suffer from low inference efficiency due to a distribution gap between noise priors and target actions, and poor robustness from ignoring historical context. OptimusVLA addresses these with two specialized memory systems: a Global Prior Memory that retrieves task-level priors from similar past trajectories to shorten the generative path, and a Local Consistency Memory that models executed action sequences to enforce temporal coherence and smoothness.

The technical breakthrough shows remarkable results across multiple benchmarks. In simulation, OptimusVLA achieved a 98.6% average success rate on LIBERO, improved over the baseline π₀ by 13.5% on CALVIN, and attained 38% success on the challenging RoboTwin 2.0 Hard. Real-world evaluations were even more impressive, with the model ranking best on Generalization and Long-horizon tasks, surpassing π₀ by 42.9% and 52.4% respectively while delivering a 2.9x inference speedup. This represents a significant step toward more efficient, reliable, and context-aware robotic systems capable of complex manipulation tasks.

Key Points

Achieved 98.6% success rate on LIBERO benchmark and 2.9x faster inference in real-world tests
Introduced dual-memory system: Global Prior Memory for task priors and Local Consistency Memory for temporal coherence
Surpassed baseline π₀ by 42.9% on generalization tasks and 52.4% on long-horizon tasks in real-world evaluation

Why It Matters

Enables more reliable and efficient robots for manufacturing, logistics, and home assistance by dramatically improving success rates and speed.

Read Original Article

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Why It Matters

Stay Ahead in AI