Research & Papers

LatentBox cuts AI image storage by 78.7% with latent-first approach

Billions of AI images stored as compressed latents, reconstructed only when needed.

Deep Dive

The explosive growth of AI-generated images has created a sustainability crisis for storage infrastructure. Platforms like Midjourney and Adobe Firefly host billions of generative images, but conventional object stores persist them as full-resolution pixel blobs, consuming massive storage and bandwidth. Unlike natural photos, AI-generated images can be deterministically reconstructed from compact, model-native latent tensors, making persistent pixel storage fundamentally redundant. In response, researchers from multiple universities developed LatentBox, a latent-first storage system that treats compressed latents as durable storage objects and uses on-demand GPU reconstruction on the read path to trade inexpensive compute for large persistent storage savings.

The design is guided by the first large-scale analysis of AI-generated image access, based on a 35-month, 2-billion-request production trace from a major generative-content platform. LatentBox keeps frequently accessed images in decoded pixel format for fast hits, stores less-active objects as compressed latents to expand effective cache capacity, and continuously adjusts the split between the image and latent caches to optimize user-perceived access latency. In prototype evaluation using the production trace, LatentBox reduces persistent storage by 78.7% with competitive or even lower mean and tail latency compared to a pure image-based storage system.

Key Points
  • LatentBox stores AI-generated images as compressed latents (model tensors) instead of pixels, achieving 78.7% storage reduction.
  • Based on a 35-month production trace with 2 billion requests from a major generative-content platform.
  • Hybrid cache: frequently accessed images stored as decoded pixels for speed, less active as latents for capacity.

Why It Matters

AI image platforms can slash storage costs dramatically without sacrificing performance, enabling more sustainable scaling.