Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
A new exploration method could finally crack reinforcement learning's hardest problems.
Researchers have introduced a new reinforcement learning algorithm called Value Bonuses with Ensemble errors (VBE). It tackles a core challenge in RL: encouraging an agent to try new actions for the first time. VBE uses an ensemble of random action-value functions to create 'value bonuses' that promote deep exploration. The paper shows VBE outperforms established methods like Bootstrap DQN, RND, and ACB on classic exploration benchmarks and scales to complex Atari environments.
Why It Matters
Better exploration is the key to creating AI agents that can learn complex, real-world tasks more efficiently and autonomously.