Research & Papers

CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation

arXiv cs.IR March 05, 2026

⚡New model dynamically weights text, images, and video based on user context and item categories.

Deep Dive

A research team led by Jinfeng Xu has introduced CAMMSR, a novel AI architecture designed to revolutionize how recommendation systems process multimodal data like text, images, and video. Accepted for publication at the prestigious ICDE 2026 conference, the model addresses a core limitation in current systems: their reliance on static, heuristic methods to fuse different data types. CAMMSR recognizes that a user's preference for an item's image versus its description isn't fixed; it changes based on the item's category and the user's own evolving interests. This allows for a more nuanced, user-centric approach to content discovery beyond what single-modality or rigidly fused models can achieve.

The technical breakthrough is the Category-guided Attentive Mixture of Experts (CAMoE) module, which learns specialized representations from multiple perspectives and explicitly models inter-modal synergies. It dynamically allocates importance to different data streams, guided by an auxiliary task that predicts item categories. Additionally, the team employs a modality swap contrastive learning task to improve alignment between different data types through sequence-level augmentation. Extensive testing on four public benchmarks shows CAMMSR consistently outperforms existing state-of-the-art models. This paves the way for the next generation of recommendation engines on streaming, e-commerce, and social platforms that can intelligently adapt which product features—a video trailer, a review snippet, or a product image—to emphasize for each individual user.

Key Points

Introduces a Category-guided Attentive Mixture of Experts (CAMoE) module for dynamic, context-aware fusion of text, image, and video data.
Outperforms existing state-of-the-art models on four public datasets, validating its adaptive and synergistic approach.
Uses an auxiliary category prediction task and a modality swap contrastive learning task to guide fusion and improve cross-modal alignment.

Why It Matters

Enables more personalized and effective recommendations on major platforms by understanding how users truly engage with multimedia content.

Read Original Article

CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation

Why It Matters

Stay Ahead in AI