RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
New benchmark tests 16 manipulation tasks to find the best memory strategy for robot AI.
A research team from institutions including the University of Michigan and Stanford has introduced RoboMME, a major new benchmark designed to systematically evaluate how AI models for robots use memory. The core problem is that today's vision-language-action (VLA) models, which control robots, struggle with tasks requiring memory—like counting actions or tracking objects that move out of sight. RoboMME addresses this by providing a standardized, large-scale testbed of 16 manipulation tasks, categorized to specifically challenge temporal, spatial, object, and procedural memory. This allows for the first apples-to-apples comparisons of different memory architectures, moving the field beyond narrow, non-standardized evaluations.
The team used RoboMME to rigorously test 14 different memory-augmented variants built on the π0.5 model backbone, exploring various memory representations and integration strategies. A key finding is that no single memory design is universally superior; performance is highly dependent on the specific task. For instance, one representation might excel at counting but fail at tracking occluded objects. This nuanced understanding is critical for progress. By open-sourcing the benchmark, code, and videos, the researchers are providing the community with an essential tool to measure, compare, and ultimately advance the memory capabilities of generalist robotic policies, a fundamental step toward robots that can perform complex, long-horizon tasks in the real world.
- Introduces RoboMME, a benchmark with 16 standardized tasks testing four types of robotic memory (temporal, spatial, object, procedural).
- Systematically tests 14 memory-augmented variants of the π0.5 VLA model, finding performance is highly task-dependent with no single best design.
- Provides an open-source framework for the research community to measure and compare memory architectures in robot AI, enabling faster progress.
Why It Matters
Provides the first standardized way to build robot AI that can remember and complete complex, multi-step tasks reliably.