RAMP: Hybrid DRL for Online Learning of Numeric Action Models
New AI framework learns numeric planning models on the fly, outperforming PPO in solvability and plan quality.
A team of researchers including Yarin Benyamin, Argaman Mordoch, Shahaf Shperberg, and Roni Stern has introduced RAMP (Reinforcement learning, Action Model learning, and Planning), a novel hybrid strategy for learning numeric action models online. Traditional automated planning requires a predefined model of an environment's actions, which is often difficult to obtain. While models can be learned from observations, existing methods for numeric domains are offline, needing pre-recorded expert traces. RAMP breaks this limitation by learning through direct interaction with the environment, simultaneously training a Deep Reinforcement Learning (DRL) policy and refining a numeric action model from past experiences.
These components form a synergistic loop: the RL policy explores to gather data, which improves the accuracy of the learned action model. In turn, this improved model enables more effective planning, which guides further policy training. To bridge the gap between planning problems and RL environments, the team also developed Numeric PDDLGym, a framework for converting numeric planning problems into standardized Gym environments. Experimental validation on standard International Planning Competition (IPC) numeric domains showed that RAMP significantly outperforms the prominent DRL algorithm PPO (Proximal Policy Optimization), achieving higher rates of solvability and generating higher-quality plans.
- Integrates DRL policy training with online action model learning in a single feedback loop.
- Outperformed the PPO algorithm in standard IPC numeric domain tests for solvability and plan quality.
- Introduced the Numeric PDDLGym framework to convert planning problems into RL-ready Gym environments.
Why It Matters
Enables AI systems to autonomously learn complex, numeric world models, advancing towards more adaptive and capable autonomous agents.