Research & Papers

Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery

Training-free algorithm uses LLMs to align open-ended queries with satellite imagery.

Deep Dive

Open-SAT, developed by researchers at a corporate lab, addresses the challenge of open-vocabulary object retrieval in satellite imagery. Traditional vision-language models like CLIP struggle with natural language queries that go beyond predefined categories. The new algorithm works in two phases: offline, it uses a VLM to compute embeddings for image tiles and stores them in a vector database; at query time, it leverages an LLM to refine the text embedding by incorporating contextual details about the target object and its surroundings. A threshold-free retrieval mechanism further boosts accuracy and efficiency.

Experimental results across three public benchmarks show that Open-SAT improves F1 scores by up to 16.04% while retrieving a comparable number of image tiles. The key innovation is its training-free nature—no fine-tuning or additional supervision is needed, making it practical for real-world deployment. By enabling more accurate natural language queries for satellite imagery, Open-SAT could power applications in disaster response, agriculture, and urban planning, where users need to find specific objects or patterns quickly.

Key Points
  • Open-SAT improves F1 score by up to 16.04% on three public satellite image benchmarks.
  • It uses an LLM to refine query embeddings at inference time, requiring no additional training.
  • The system employs a threshold-free retrieval mechanism for efficient and accurate results.

Why It Matters

Enables accurate natural language search over satellite imagery without retraining, unlocking new applications in remote sensing.