Image & Video

GATA2Floor: Graph attention model counts building floors from street-view images

New model counts floors without manual labels using self-supervised features and graph reasoning.

Deep Dive

GATA2Floor is introduced, a multi-head Graph Attention v2 (GATv2) model for counting floors in street-view facades. It models each facade as a graph over window/door detections with a vertical edge prior, predicting global floor counts and softly assigning elements to latent slots via learnable cross-attention queries. The label-free proposal mechanism uses self-supervised features and vision-language scoring, enabling training without annotated datasets. Accepted at IEEE ICIP 2026.

Key Points
  • GATA2Floor uses GATv2 (Graph Attention v2) to model facades as graphs over window/door detections with a vertical edge prior.
  • Employs learnable cross-attention queries to softly assign detections to latent floor slots, enabling interpretable predictions.
  • Label-free proposal mechanism leverages self-supervised features and vision-language scoring, eliminating the need for annotated datasets.

Why It Matters

Enables scalable automated building analysis for urban planning, energy assessment, and emergency response without costly manual annotations.