SGAP-Gaze: Scene Grid Attention Based Point-of-Gaze Estimation Network for Driver Gaze
New model fuses face and traffic scene data to predict where drivers are looking with 104.73 pixel accuracy.
Researchers Pavan Kumar Sharma and Pranamesh Chakraborty have introduced SGAP-Gaze, a new AI model that significantly improves the accuracy of determining where a driver is looking. Unlike previous models that rely solely on facial features, SGAP-Gaze explicitly incorporates visual context from the traffic scene outside the vehicle. It uses a novel Scene-Grid Attention mechanism, built on a Transformer architecture, to fuse data from the driver's face, eyes, and iris with the surrounding road environment. This multi-modal approach creates a more robust "gaze intent vector."
To train and test their model, the team also created a new benchmark dataset called Urban Driving-Face Scene Gaze (UD-FSG), which contains synchronized images of driver faces and the corresponding traffic scenes. On this dataset, SGAP-Gaze achieved a mean pixel error of 104.73, representing a 23.5% reduction in error compared to existing state-of-the-art methods. The model shows particular strength in accurately estimating gaze in the outer regions of a scene—areas that are critical for spotting hazards but are often missed by other systems. This advancement highlights the effectiveness of combining scene-aware attention with traditional facial analysis for building safer, more reliable driver monitoring systems.
- SGAP-Gaze integrates driver facial data with traffic scene context using a Transformer-based attention mechanism.
- The model achieves a 23.5% lower mean pixel error (104.73 on UD-FSG) than previous state-of-the-art methods.
- It performs especially well in outer scene regions, crucial for detecting rare but critical driving hazards.
Why It Matters
Enables more precise monitoring of driver attention, a critical component for developing next-generation vehicle safety and autonomous driving systems.