PRISM Model Reveals Hidden Goal Switching in Robots and Animals
New AI method deciphers when and why agents switch intentions mid-task
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Traditional inverse reinforcement learning (IRL) assumes a single stationary reward function, failing to capture goal switching within an episode. Recent multi-intention IRL methods segment trajectories but model intention transitions either as a memoryless Markov chain or through manual state augmentation with a fixed history window—both limited for complex, temporally dependent behaviors. The Probabilistic Recurrent Intention Switching Model (PRISM), proposed by Sheng, Zhu, and Boedecker, replaces these mechanisms with a lightweight recurrent network that dynamically maps the observation history to a per-step intention distribution. The authors prove that the resulting EM objective exactly decomposes into independent per-intention reward subproblems, each solvable in closed form, yielding an O(nK) E-step with no variational approximation.
PRISM is evaluated on three distinct domains: a non-Markovian gridworld, a mouse labyrinth navigation task, and BridgeData V2 robotic manipulation—the first large-scale robotic application of multi-intention IRL. Across all settings, PRISM achieves the highest held-out log-likelihood while recovering nameable, temporally coherent intentions from completely unlabeled demonstrations. The results suggest that discrete goal switching is present in both biological and artificial agents, offering a principled framework for understanding and leveraging intention dynamics in autonomous systems, robotics, and neuroscience.
- PRISM uses a lightweight recurrent network to model intention transitions per step, overcoming limitations of memoryless Markov chains and fixed history windows.
- Achieves O(nK) EM complexity with closed-form per-intention reward solutions, requiring no variational approximations.
- Demonstrates state-of-the-art held-out log-likelihood across gridworld, mouse labyrinth, and BridgeData V2 robotic manipulation benchmarks.
Why It Matters
Unlocks a deeper understanding of goal-switching behaviors, improving robot learning from human demonstrations and advancing neuroscience models.