Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery
Training-free algorithm uses LLMs to align open-ended queries with satellite imagery.
Open-SAT, developed by researchers at a corporate lab, addresses the challenge of open-vocabulary object retrieval in satellite imagery. Traditional vision-language models like CLIP struggle with natural language queries that go beyond predefined categories. The new algorithm works in two phases: offline, it uses a VLM to compute embeddings for image tiles and stores them in a vector database; at query time, it leverages an LLM to refine the text embedding by incorporating contextual details about the target object and its surroundings. A threshold-free retrieval mechanism further boosts accuracy and efficiency.
Experimental results across three public benchmarks show that Open-SAT improves F1 scores by up to 16.04% while retrieving a comparable number of image tiles. The key innovation is its training-free nature—no fine-tuning or additional supervision is needed, making it practical for real-world deployment. By enabling more accurate natural language queries for satellite imagery, Open-SAT could power applications in disaster response, agriculture, and urban planning, where users need to find specific objects or patterns quickly.
- Open-SAT improves F1 score by up to 16.04% on three public satellite image benchmarks.
- It uses an LLM to refine query embeddings at inference time, requiring no additional training.
- The system employs a threshold-free retrieval mechanism for efficient and accurate results.
Why It Matters
Enables accurate natural language search over satellite imagery without retraining, unlocking new applications in remote sensing.