Cortex learns structures, subcortex handles rewards under memory limits
New theory reveals how brain splits learning when cortical memory runs short.
A new theoretical paper by Matthew Farrell and Taro Toyoizumi (RIKEN CBS) sheds light on how the brain divides labor between the cortex and subcortex when memory is limited. The researchers extended classic dual-system reinforcement learning models—where a flexible, model-based module (cortex) learns alongside a simpler, model-free module (subcortex)—by explicitly constraining the memory capacity of the model-based system. They then simulated decision-making tasks to see how different memory allocation strategies perform under various reward dynamics.
The key finding: when the rewarded states change often (volatile environment), the optimal strategy for the memory-constrained model-based system is not to memorize which states currently give reward, but to instead focus on learning the general transition structure of the environment. This allows the cortex to build a reusable world model, while the subcortex exploits moment-to-moment rewards. The work provides a computational justification for the observed functional dissociation—cortex as a structure learner, subcortex as a reward tracker—and offers testable predictions for neural experiments. For AI, it suggests that resource-efficient agents might benefit from similar memory-aware specialization.
- Extended model-based (cortex) vs model-free (subcortex) learning framework with explicit memory constraints on the cortex module.
- When reward states change frequently, the optimal cortical strategy switches from reward exploitation to learning general environment structure.
- Provides a theoretical foundation for cortical-subcortical dissociation: cortex learns structure, subcortex handles reward-based learning.
Why It Matters
Explains brain's resource-efficient learning split, offering design principles for memory-constrained AI agents.