Research & Papers

AURA-Mem: action-gated memory slashes robot VRAM 6000x with constant 4KB state

Robot AI memory stays at 4,224 bytes forever while KV-cache balloons 6,061x

Deep Dive

A new paper from Josef Chen tackles a fundamental mismatch between datacenter inference and embodied agents. Current robot AI relies on the KV-cache — excellent for short, batched queries but disastrous for long-running robot episodes on bandwidth-limited edge hardware. The cache grows linearly with steps, consuming scarce high-bandwidth memory and flash endurance. AURA-Mem (Action-Utility Recurrent Adaptive Memory) replaces this with a frozen vision-language-action backbone wrapped in a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action. The result: a fixed inference state of 4,224 bytes, no matter how long the robot runs, compared to a KV-cache that grows 6,061 times larger at 100,000 steps.

Benchmarks confirm the approach works without sacrificing accuracy. On a synthetic closed-loop task, AURA-Mem matches the best O(1) baseline while using 5.19-6.13 times fewer writes overall, and up to 9.19 times fewer on easier configurations. Random or periodic write schedules cannot replicate this gain, proving the action-surprise gating signal is key. On the LIBERO-Long benchmark using OpenVLA-OFT 7B (60 episodes per arm), AURA-Mem achieves a success rate of 0.233 — identical to the ungated base policy and slightly above an always-write KV arm at 0.217 — while using 7.0 times fewer writes and constant memory. The paper also demonstrates a theoretical value-loss bound, though at this scale it remains vacuous. AURA-Mem points toward practical, memory-efficient robot policies that can run indefinitely on edge hardware.

Key Points
  • AURA-Mem uses a constant 4,224 byte inference state vs. a KV-cache that grows 6,061x larger at 100,000 steps
  • On OpenVLA-OFT 7B (LIBERO-Long), it matches base policy success (0.233) with 7.0x fewer writes and constant memory
  • Learned action-gating writes only when observation changes next action, outperforming random/periodic schedules

Why It Matters

Enables long-running robot policies on edge hardware with limited memory and flash endurance — critical for real-world embodied AI.