SGAP-Gaze AI cuts driver gaze error by 23.5% with scene-aware attention
New model fuses face and traffic scene data to predict where drivers are looking with 104.73 pixel accuracy.
Researchers Pavan Kumar Sharma and Pranamesh Chakraborty have introduced SGAP-Gaze, a new AI model that significantly improves the accuracy of determining where a driver is looking. Unlike previous models that rely solely on facial features, SGAP-Gaze explicitly incorporates visual context from the traffic scene outside the vehicle. It uses a novel Scene-Grid Attention mechanism, built on a Transformer architecture, to fuse data from the driver's face, eyes, and iris with the surrounding road environment. This multi-modal approach creates a more robust "gaze intent vector."
To train and test their model, the team also created a new benchmark dataset called Urban Driving-Face Scene Gaze (UD-FSG), which contains synchronized images of driver faces and the corresponding traffic scenes. On this dataset, SGAP-Gaze achieved a mean pixel error of 104.73, representing a 23.5% reduction in error compared to existing state-of-the-art methods. The model shows particular strength in accurately estimating gaze in the outer regions of a scene—areas that are critical for spotting hazards but are often missed by other systems. This advancement highlights the effectiveness of combining scene-aware attention with traditional facial analysis for building safer, more reliable driver monitoring systems.
- SGAP-Gaze integrates driver facial data with traffic scene context using a Transformer-based attention mechanism.
- The model achieves a 23.5% lower mean pixel error (104.73 on UD-FSG) than previous state-of-the-art methods.
- It performs especially well in outer scene regions, crucial for detecting rare but critical driving hazards.
Why It Matters
Enables more precise monitoring of driver attention, a critical component for developing next-generation vehicle safety and autonomous driving systems.