Research & Papers

Discrete Preference Learning for Personalized Multimodal Generation

arXiv cs.IR April 23, 2026

⚡A new two-stage framework turns user interactions into discrete tokens for consistent, personalized multimodal content.

Deep Dive

A research team from multiple institutions has introduced a novel AI framework called DPPMG (Discrete Preference Learning for Personalized Multimodal Generation) that addresses critical limitations in current personalized generative models. The system tackles two main challenges: the gap between continuous user preferences and the discrete token inputs required by generator architectures like GPT-4 and Stable Diffusion, and the potential inconsistency between generated images and texts. Their solution involves a two-stage approach where a modal-specific graph neural network first learns users' preferences from their multimodal interactions, then quantizes these preferences into discrete tokens that can be injected into downstream text and image generators.

To ensure the generated content remains both personalized and consistent across modalities, the researchers designed a cross-modal consistent and personalized reward mechanism to fine-tune the token-associated parameters during training. This approach allows the system to maintain individual user preferences while ensuring that generated images and text align logically. The framework has been validated through extensive experiments on two real-world datasets, demonstrating significant improvements in generating personalized and coherent multimodal content compared to existing methods. The paper has been accepted for publication at SIGIR 2026, a premier conference in information retrieval, indicating its potential impact on next-generation recommendation systems and personalized content creation tools.

Key Points

Uses a modal-specific graph neural network to learn user preferences from multimodal interactions and quantizes them into discrete tokens
Addresses the architecture gap by converting continuous preferences to discrete tokens compatible with generators like GPT and Stable Diffusion
Implements a cross-modal consistency reward to fine-tune parameters, ensuring personalized yet coherent text-image outputs

Why It Matters

Enables AI systems to generate truly personalized, consistent text and images together, advancing recommendation engines and creative tools.

Read Original Article

Discrete Preference Learning for Personalized Multimodal Generation

Why It Matters

Stay Ahead in AI