egenioussBench: A New Dataset for Geospatial Visual Localisation
42 test images with cm-accurate ground truth from 2,709 smartphone photos
EgenioussBench tackles a core challenge in computer vision: geospatial visual localization — the task of determining where a photo was taken using reference 3D maps. Unlike traditional benchmarks that rely on structure-from-motion (SfM) reconstructions, egenioussBench leverages deployable city-scale assets: an airborne 3D mesh and a CityGML LoD2 model. This design ensures scalability and reflects real-world mapping data used by urban planners and autonomous systems.
The query set consists of 2,709 smartphone images captured in an urban environment, each paired with centimeter-accurate ground truth poses derived from PPK (precise point kinematic) GPS and ground control point adjustments. A co-visibility matrix computed from rendered depth identifies a maximum independent set, yielding a test split of 42 non-overlapping images with withheld ground truth and a validation split of 412 sequential images. The benchmark also introduces a public leaderboard evaluated with binning metrics at multiple pose-error thresholds, plus global statistics like median, RMSE, and outlier ratio. This ensures fair, like-for-like comparisons across methods using either mesh or LoD2 reference data. Code and data are publicly available, advancing large-scale cross-domain localization.
- Dataset includes 2,709 smartphone images with centimeter-accurate ground truth (PPK + GCP)
- Uses city-scale airborne 3D mesh and CityGML LoD2 models, not SfM reconstructions
- Public leaderboard with binning metrics at multiple error thresholds for fair method comparison
Why It Matters
Bridges the gap between computer vision research and real-world geospatial mapping for autonomous navigation and urban planning.