Uses CLIP embeddings to build a semantic hierarchy of ImageNet-1K classes (e.g., coarse 'animal' → fine 'dog breed')?

Uses CLIP embeddings to build a semantic hierarchy of ImageNet-1K classes (e.g., coarse 'animal' → fine 'dog breed').

Decomposes latent representations into hierarchical channel blocks, each optimized for a specific semantic level?

Decomposes latent representations into hierarchical channel blocks, each optimized for a specific semantic level.

Outperforms existing progressive codecs in hierarchical evaluation, boosting coarse recognition at low bitrates without sacrificing fine-grained accuracy at high bitrates?

Outperforms existing progressive codecs in hierarchical evaluation, boosting coarse recognition at low bitrates without sacrificing fine-grained accuracy at high bitrates.

Image & Video

New AI codec compresses images semantic-first, bits decoded coarse-to-fine

arXiv eess.IV May 12, 2026

⚡A single bitstream lets AI first see 'animal', then 'dog', then 'poodle'.

Deep Dive

In a paper accepted at ICIP 2026, researchers from (presumably) Yonsei University introduce a semantic hierarchy-aware progressive image codec. While existing learned image compression (LIC) systems and progressive codecs focus on sample-level difficulty (easy-to-hard), this work reframes progressive transmission through 'semantic scalability'—coarse-to-fine. The authors first use CLIP embeddings to systematically group ImageNet-1K classes into a semantic hierarchy (e.g., animal → mammal → dog → poodle). Then, using a channel-wise autoregressive framework, they decompose the latent representation into hierarchically ordered channel blocks, each explicitly trained to reconstruct the image at a specific semantic granularity.

Extensive experiments show the method significantly improves coarse-level recognition (e.g., animal vs. vehicle) at very low bitrates while maintaining fine-grained accuracy (e.g., breed-level) as more bits are received. This outperforms existing progressive codecs under hierarchical evaluation. The approach offers an interpretable and efficient solution for task-adaptive image coding—useful for machine perception systems that can prioritize semantic understanding over pixel-perfect reconstruction, such as autonomous driving, surveillance, or edge AI where bandwidth is limited and decisions need to be made incrementally.

Key Points

Uses CLIP embeddings to build a semantic hierarchy of ImageNet-1K classes (e.g., coarse 'animal' → fine 'dog breed').
Decomposes latent representations into hierarchical channel blocks, each optimized for a specific semantic level.
Outperforms existing progressive codecs in hierarchical evaluation, boosting coarse recognition at low bitrates without sacrificing fine-grained accuracy at high bitrates.

Why It Matters

Enables bandwidth-efficient AI vision systems that can first identify broad categories before refining, saving time and data.

Read Original Article

New AI codec compresses images semantic-first, bits decoded coarse-to-fine

Why It Matters

Related Articles

🚀 Stay Ahead in AI