Research & Papers

Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response

31.6% accuracy within 50km on disaster queries without paired training data

Deep Dive

A new paper from researchers presents GeoQuery, a zero-shot retrieval system that enables natural-language search of satellite imagery without expensive paired training data. Traditional contrastive models like CLIP require millions of image-caption pairs, which are scarce for remote sensing. GeoQuery sidesteps this by using prompt-aligned text proxies: it generates textual descriptions for a 100k proxy subset of global Sentinel-2 tiles and optimizes the generation prompt so that text embeddings correlate with visual embeddings from the frozen CLAY foundation model.

Queries are resolved in two stages: first a text-similarity search over the 100k proxy subset, then a visual nearest-neighbor search over worldwide CLAY embeddings. On 76 disaster-location queries covering UK floods, US wildfires, and US droughts, GeoQuery achieves 31.6% accuracy within 50 km, with 50% accuracy on floods where RGB terrain features are well captured. The system was deployed within ECHO, a crisis response platform using Agentic Action Graphs, and successfully identified vulnerable areas during Brisbane's 2025 Cyclone Alfred. Downstream flood simulations reproduced historical patterns, demonstrating practical operational value.

Key Points
  • GeoQuery uses prompt-aligned text proxies with a 100k Sentinel-2 subset to align CLAY embeddings without paired training.
  • Achieves 31.6% accuracy within 50km across 76 disaster queries; 50% accuracy on flood events.
  • Deployed in ECHO crisis response system during Cyclone Alfred (2025), enabling downstream flood simulations.

Why It Matters

Enables rapid satellite image search for disaster response, bypassing costly paired data requirements.