Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents
New adaptive task-switching policy solves multi-task interference, enabling longer autonomous operation.
A team of researchers from NYU Abu Dhabi and TU Wien has introduced SwitchMT, a novel methodology designed to overcome a critical bottleneck in training intelligent autonomous agents. The core challenge in multi-task reinforcement learning (RL) is task interference, where learning one skill degrades performance on another. Current state-of-the-art approaches using Spiking Neural Networks (SNNs) for their energy efficiency often rely on fixed schedules for switching between tasks during training, which limits performance and scalability. SwitchMT directly addresses this by implementing an adaptive policy that decides when to switch tasks based on real-time learning progress.
The methodology employs two key innovations: a Deep Spiking Q-Network enhanced with 'active dendrites' and a dueling architecture to create specialized sub-networks for different tasks, and a novel adaptive task-switching policy. This policy doesn't just look at external rewards but also monitors the internal dynamics of the network's parameters to determine the optimal moment to change focus. In benchmarks across multiple Atari games, SwitchMT achieved competitive scores—including 355.2 in Enduro and 5.6 in Breakout—and, crucially, enabled agents to operate for significantly longer episodes compared to prior methods. This demonstrates a path toward more capable and scalable autonomous systems that can learn diverse skills simultaneously without a proportional increase in computational power or network size, a vital step for real-world deployment in robotics and embedded AI.
- Uses a Deep Spiking Q-Network with active dendrites to create task-specific sub-networks within a single model.
- Introduces an adaptive task-switching policy that leverages both reward signals and internal network parameter dynamics.
- Achieved benchmark scores of -8.8 in Pong, 5.6 in Breakout, and 355.2 in Enduro, enabling longer operational episodes.
Why It Matters
Enables more capable, longer-lasting autonomous robots and embedded AI by solving multi-task interference efficiently.