Achieves 69.8% success on RMBench, up from 28.4% baseline and 42% Mem-0 explicit-memory baseline?

Achieves 69.8% success on RMBench, up from 28.4% baseline and 42% Mem-0 explicit-memory baseline

Real-world Franka tasks?

91.5% stage success and 80% full-task success, vs 70.7% and 52.5% baselines

Uses multi-scale memory banks with similarity-based merging and a progress-supervision objective to track task stage?

Uses multi-scale memory banks with similarity-based merging and a progress-supervision objective to track task stage

Robotics

DIM-WAM boosts robot task success from 28% to 70% with memory augmentation

arXiv cs.RO June 29, 2026

⚡New memory-augmented world-action model achieves 91.5% stage success on real robots

Deep Dive

World-action models predict future visual states and actions jointly, but existing methods struggle with long-horizon tasks that depend on earlier observations and task progress. Researchers introduce DIM-WAM, a memory-augmented approach that extracts compact visual event information from real observations, updates multiple memory banks through independent similarity-based merging, and reads bank-identity- and time-embedded long-term context to condition video and action denoising. A progress-supervision objective further forces memory tokens to encode not only completed historical events but also the current task stage and its implications.

On the RMBench benchmark, DIM-WAM achieved 69.8% average success, dramatically improving over the 28.4% baseline from LingBot-VA and outperforming the explicit-memory Mem-0 baseline at 42.0%. In real-world experiments on four Franka robot tasks, stage success improved from 70.7% to 91.5% and full-task success from 52.5% to 80.0%. These results demonstrate that properly remembering and utilizing multi-scale historical context is key to enabling robots to handle complex, sequentially dependent manipulation tasks.

Key Points

Achieves 69.8% success on RMBench, up from 28.4% baseline and 42% Mem-0 explicit-memory baseline
Real-world Franka tasks: 91.5% stage success and 80% full-task success, vs 70.7% and 52.5% baselines
Uses multi-scale memory banks with similarity-based merging and a progress-supervision objective to track task stage

Why It Matters

Enables robots to remember long-horizon task context, critical for complex real-world manipulations.

Read Original Article

DIM-WAM boosts robot task success from 28% to 70% with memory augmentation

Why It Matters

Related Articles

🚀 Stay Ahead in AI