Research & Papers

Gradient-based Planning for World Models at Longer Horizons

New gradient-based method solves long-horizon planning problems that stumped previous AI systems by 40%.

Deep Dive

A research team from Meta AI and UC Berkeley has introduced GRASP (Gradient-based Planning for World Models), a breakthrough method that makes long-horizon planning practical for AI systems using learned world models. These models, which predict future states from current observations and actions, have become increasingly capable but remained fragile when used for planning beyond short time horizons. GRASP addresses three critical failure points: it lifts trajectories into virtual states to parallelize optimization across time, adds stochasticity directly to state iterates for better exploration, and reshapes gradients to provide clean action signals while avoiding problematic "state-input" gradients through high-dimensional vision models.

The technical innovation lies in how GRASP transforms the planning optimization problem. Traditional gradient-based planning suffers from exploding/vanishing gradients when backpropagating through 100+ step rollouts, creating ill-conditioned computation graphs. GRASP's virtual state formulation allows simultaneous optimization across all time steps rather than sequential processing, while its gradient reshaping prevents the brittleness that occurs when gradients pass through high-dimensional latent spaces of modern vision models. This enables AI agents to effectively use powerful world models—which can predict sequences of future observations in visual spaces—for practical control and decision-making tasks requiring long-term reasoning.

What makes GRASP particularly significant is its timing. As world models scale from task-specific predictors to general-purpose simulators, the bottleneck shifts from model capability to planning efficiency. The research team, including Yann LeCun and colleagues, demonstrates that GRASP can handle horizons where previous methods fail, opening doors for more sophisticated AI agents that can plan complex sequences of actions in realistic environments. This represents a crucial step toward AI systems that don't just predict what happens next, but can strategically optimize actions over extended timeframes.

Key Points
  • Parallelizes optimization across time via virtual states, enabling simultaneous planning across all steps
  • Adds stochasticity directly to state iterates for 40% better exploration in complex environments
  • Reshapes gradients to avoid brittle failures when passing through high-dimensional vision model latents

Why It Matters

Enables AI agents to perform strategic planning over 100+ steps, moving from reactive systems to truly forward-thinking artificial intelligence.