Research & Papers

POLAR framework personalizes embodied AI agents with long-term memory

New memory system lets AI agents recall your habits from past interactions.

Deep Dive

Embodied AI agents powered by multimodal large language models (MLLMs) are getting better at following generic instructions and recognizing objects, but true personalization remains a challenge. In real-world settings, users often rely on implicit context from past interactions—e.g., “bring me the one I asked about yesterday.” To bridge this gap, researchers from KAIST (authors not explicitly stated but inferred from paper) introduce POLAR, a novel memory-augmented framework that enables agents to learn and recall long-term user preferences.

POLAR structures prior interactions into a multimodal knowledge graph with two memory types: semantic memory stores personalized concepts and visual cues (like “this user always drinks black coffee”), while episodic memory captures embodied experiences such as past navigation trajectories and object manipulations. When a new task arrives, POLAR retrieves only the most relevant memories, reducing noise and improving reasoning. The team tested POLAR across multiple MLLM backbones (e.g., GPT-4V, LLaVA) in diverse simulation environments. Results show consistent gains, with particularly strong improvements in tasks requiring multi-step inference, updates to user preferences over time, and disambiguation of vague references. POLAR marks a step toward AI assistants that truly understand individual users across days or weeks of interaction.

Key Points
  • POLAR uses a multimodal knowledge graph to store both semantic (personalized concepts) and episodic (agent trajectories) memories.
  • The framework consistently improves task performance across multiple MLLM backbones, with most gains in multi-hop reasoning and tracking user-specific updates.
  • Retrieval mechanism focuses on relevant past interactions, enabling agents to interpret implicit user requests like 'the one I asked about yesterday'.

Why It Matters

Enables AI assistants that learn from long-term user history, reducing repetitive instructions in homes and workplaces.