Text-RSIR shrinks satellite image transmission to 2% using text prompts
Send high-res satellite images as tiny text descriptions, then reconstruct them with AI.
Text-RSIR, developed by Hao Yang, Xianping Ma, Peifeng Ma, and Man-On Pun, tackles a fundamental bottleneck in remote sensing: moving massive high-resolution imagery over bandwidth-limited links. Instead of shoving full pixel data through the pipe, the system equips the satellite or UAV with an onboard text generator that produces short descriptions of spatial features and semantic content. These text summaries, combined with a low-resolution version of the image, reduce transmitted data to roughly 2% of the original volume. On the ground, a text-conditioned image restoration model uses cross-modal learning to recover fine details and maintain semantic coherence, producing final images that are both useful for analysis and visually faithful.
The framework was validated on three datasets—Alsat-2B (16.36 dB PSNR), UC Merced Land Use (26.87 dB), and Aerial Image (27.41 dB)—demonstrating that even at extreme compression ratios, reconstruction quality remains viable for environmental monitoring and urban mapping. The authors plan to release the implementation on GitHub. By offloading heavy pixel data for lightweight text, Text-RSIR could enable real-time or near-real-time satellite analytics from low-bandwidth ground stations, drones, or IoT devices, making high-resolution Earth observation far more accessible.
- Transmits data as low-resolution images + text, reducing volume to ~2% of original size
- Achieves reconstruction PSNR of 16.36 dB (Alsat-2B), 26.87 dB (UC Merced), and 27.41 dB (Aerial Image)
- Text-conditioned model uses cross-modal learning to restore spatial details while preserving semantic coherence
Why It Matters
Enables high-resolution satellite imagery transmission over low-bandwidth links, critical for real-time environmental monitoring and disaster response.