UniMark: Unified Adaptive Multi-bit Watermarking for Autoregressive Image Generators
The training-free framework embeds multi-bit messages directly into autoregressive image generators like LlamaGen and VAR.
A research team led by Yigit Yilmaz and Elena Petrova has introduced UniMark, a novel watermarking framework designed specifically for autoregressive (AR) image generators like LlamaGen and VAR. The system addresses three critical limitations of existing methods: the inability to embed multi-bit messages, reliance on static codebooks vulnerable to security attacks, and lack of generalization across different AR architectures. UniMark's core innovation is its training-free approach, which allows it to be applied to existing models without retraining.
UniMark operates through three key components. First, Adaptive Semantic Grouping (ASG) dynamically partitions the model's codebook based on semantic similarity and a secret key, ensuring both image fidelity and security against exposure. Second, Block-wise Multi-bit Encoding (BME) divides the token sequence into blocks, encoding different bits across them with error-correcting codes for reliable extraction. Third, a Unified Token-Replacement Interface (UTRI) abstracts the embedding process to support both next-token and next-scale prediction paradigms.
Extensive testing demonstrates that UniMark achieves state-of-the-art performance in image quality (measured by FID), watermark detection accuracy, and multi-bit message extraction. Crucially, it maintains robustness against common attacks including cropping, JPEG compression, Gaussian noise, blur, color jitter, and random erasing. This combination of high capacity, security, and cross-model compatibility represents a significant advancement in practical AI content provenance.
- Embeds multi-bit messages (not just binary verification) directly into AI-generated images during the autoregressive generation process.
- Uses Adaptive Semantic Grouping with a secret key for dynamic, secure codebook partitioning that resists attacks if exposed.
- Provides a Unified Token-Replacement Interface supporting diverse AR paradigms like LlamaGen (next-token) and VAR (next-scale).
Why It Matters
Provides a practical, secure method for tracing AI-generated content and protecting intellectual property at scale.