Research & Papers

The macaque IT cortex but not current artificial vision networks encode object position in perceptually aligned coordinates

Macaque brain research shows biological vision adapts to illusions, while AI models like GPT-4V fail to replicate this perceptual alignment.

Deep Dive

A neuroscience study led by researchers from MIT and York University reveals a fundamental difference between biological and artificial vision systems. Using the motion aftereffect illusion—which shifts perceived object position without changing retinal input—the team recorded from macaque inferior temporal (IT) cortex and found systematic, perception-aligned position biases in neural population codes. These biases directly mirrored human psychophysical reports, demonstrating that biological vision represents object location in coordinates aligned with conscious perception, not just pixel-based coordinates.

When testing artificial vision systems, including standard feedforward networks, recurrent architectures, and state-of-the-art video models, researchers found these AI systems accurately encode pixel-level object position but completely fail to exhibit the adaptation-induced perceptual shifts seen in biological vision. However, applying transformations derived from the macaque IT adaptation dynamics to model feature spaces was sufficient to generate similar biases, suggesting a pathway for bridging this gap. The work, published on arXiv (2603.11248), indicates current AI vision lacks the history-dependent, perceptually grounded spatial coding inherent to biological systems.

Key Points
  • Macaque IT cortex shows perceptual position shifts during motion aftereffect illusion, matching human reports
  • Tested AI vision models (feedforward, recurrent, video-based) fail to replicate these perceptual alignment effects
  • Applying neural-derived transformations to model features can induce similar biases, pointing to a fixable gap

Why It Matters

This reveals a core limitation in current computer vision AI, impacting applications in robotics, AR/VR, and autonomous systems where perceptual alignment is critical.