Research & Papers

ICG framework personalizes cover images with MLLM prompting and preference alignment

AI-generated covers that match your taste — no labels required.

Deep Dive

Researchers have introduced ICG (Improving Cover Image Generation), a novel framework that leverages multimodal large language models (MLLMs) and diffusion models to create personalized cover images. Unlike previous approaches that relied on handcrafted prompts and disjointed modules, ICG uses an adapter to bridge MLLMs and diffusion models for end-to-end training. The system extracts semantic features from item titles and reference images via meta tokens, then refines them with user embeddings to inject personalized context into the diffusion process. This allows generated covers to be contextually relevant to both the content and the user's preferences. To overcome the lack of labeled supervision, ICG employs a multi-reward learning strategy that combines publicly available aesthetic and relevance rewards with a personalized preference model derived from actual user behavior.

Experiments demonstrate that ICG significantly improves image quality, semantic fidelity, and personalization compared to baselines. The framework leads to stronger user appeal and improved offline recommendation accuracy in downstream tasks. A key advantage is ICG's plug-and-play design: it is compatible with common diffusion model checkpoints and requires no ground-truth labels during optimization. Published at EMNLP 2025, the work highlights practical applications for e-commerce, streaming services, and social media where personalized cover images can dramatically boost click-through rates and user engagement.

Key Points
  • ICG integrates MLLMs with diffusion models via a trainable adapter for end-to-end personalized cover generation without handcrafted prompts.
  • A multi-reward learning strategy combines aesthetic, relevance, and personalized preference rewards to optimize quality without needing ground-truth labels.
  • The framework improves image quality, semantic fidelity, and personalization, leading to higher user appeal and offline recommendation accuracy.

Why It Matters

Personalized cover images directly increase user engagement and recommendation effectiveness on digital platforms like e-commerce and streaming.