Research & Papers

PRISM Model Reveals Hidden Goal Switching in Robots and Animals

New AI method deciphers when and why agents switch intentions mid-task

Deep Dive

Traditional inverse reinforcement learning (IRL) assumes a single stationary reward function, failing to capture goal switching within an episode. Recent multi-intention IRL methods segment trajectories but model intention transitions either as a memoryless Markov chain or through manual state augmentation with a fixed history window—both limited for complex, temporally dependent behaviors. The Probabilistic Recurrent Intention Switching Model (PRISM), proposed by Sheng, Zhu, and Boedecker, replaces these mechanisms with a lightweight recurrent network that dynamically maps the observation history to a per-step intention distribution. The authors prove that the resulting EM objective exactly decomposes into independent per-intention reward subproblems, each solvable in closed form, yielding an O(nK) E-step with no variational approximation.

PRISM is evaluated on three distinct domains: a non-Markovian gridworld, a mouse labyrinth navigation task, and BridgeData V2 robotic manipulation—the first large-scale robotic application of multi-intention IRL. Across all settings, PRISM achieves the highest held-out log-likelihood while recovering nameable, temporally coherent intentions from completely unlabeled demonstrations. The results suggest that discrete goal switching is present in both biological and artificial agents, offering a principled framework for understanding and leveraging intention dynamics in autonomous systems, robotics, and neuroscience.

Key Points
  • PRISM uses a lightweight recurrent network to model intention transitions per step, overcoming limitations of memoryless Markov chains and fixed history windows.
  • Achieves O(nK) EM complexity with closed-form per-intention reward solutions, requiring no variational approximations.
  • Demonstrates state-of-the-art held-out log-likelihood across gridworld, mouse labyrinth, and BridgeData V2 robotic manipulation benchmarks.

Why It Matters

Unlocks a deeper understanding of goal-switching behaviors, improving robot learning from human demonstrations and advancing neuroscience models.