Combines a fine-tuned Octo policy with a JEPA world model using MPPI planning, initialized by a policy-derived action distribution?

Combines a fine-tuned Octo policy with a JEPA world model using MPPI planning, initialized by a policy-derived action distribution.

Systematically tested vision backbones DINOv2 and V-JEPA-2, showing the framework's adaptability to different visual encoders?

Systematically tested vision backbones DINOv2 and V-JEPA-2, showing the framework's adaptability to different visual encoders.

Outperforms standalone reactive policies and uninformed world model planners on real-world navigation tasks from the CAST dataset?

Outperforms standalone reactive policies and uninformed world model planners on real-world navigation tasks from the CAST dataset.

Robotics

PiJEPA AI combines Octo policy with JEPA world model for smarter robot navigation

arXiv cs.RO March 30, 2026

⚡New two-stage framework uses policy-guided planning to improve robot navigation accuracy by 40% over existing methods.

Deep Dive

Researchers Amirhosein Chahe and Lifeng Zhou have introduced PiJEPA, a two-stage AI framework designed to solve a core challenge in robotics: getting a robot to navigate to a visually specified goal using only natural language instructions. Existing methods typically rely on either reactive policies, which fail at long-term planning, or world models, which struggle to initialize actions in complex environments. PiJEMA elegantly combines both approaches. Its first stage involves fine-tuning a generalist navigation policy called Octo, augmented with a frozen vision encoder like DINOv2 or V-JEPA-2, on the CAST dataset. This creates a smart 'policy prior' that suggests probable actions based on the current camera view and the user's command.

In the second stage, this informed action distribution is used to 'warm-start' a sophisticated planner. Instead of searching randomly, the planner uses the policy's suggestions to initialize a Model Predictive Path Integral (MPPI) algorithm, which then plans over a separately trained JEPA world model. This world model predicts future states in the vision encoder's latent space. By starting the search from a smart guess rather than a random one, PiJEPA's planner converges much faster to high-quality action sequences that successfully reach the goal. Experiments demonstrate that this hybrid approach significantly outperforms using either the policy or the world model alone, leading to more accurate and reliable robot navigation that faithfully follows complex instructions.

Key Points

Combines a fine-tuned Octo policy with a JEPA world model using MPPI planning, initialized by a policy-derived action distribution.
Systematically tested vision backbones DINOv2 and V-JEPA-2, showing the framework's adaptability to different visual encoders.
Outperforms standalone reactive policies and uninformed world model planners on real-world navigation tasks from the CAST dataset.

Why It Matters

Enables more reliable and instruction-following robots for logistics, home assistance, and search & rescue by improving long-horizon planning.

Read Original Article

PiJEPA AI combines Octo policy with JEPA world model for smarter robot navigation

Why It Matters

Related Articles

🚀 Stay Ahead in AI