Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion
When sensors fail or data is missing, this graph retrieval method rebuilds it better than ever.
Multimodal recommendation systems rely on diverse data types—text, images, video—to generate accurate personalization. But in real-world deployments, missing modalities are common due to sensor failures, annotation costs, or privacy constraints. Existing modality completion methods only look at the node itself or immediate neighbors, missing richer semantic context. A new paper from researchers at the National University of Singapore proposes GRE-MC (Graph Retrieval-Enhanced Modality Completion). GRE-MC first retrieves a subgraph of semantically relevant nodes from the entire graph—not just neighbors—then uses a graph transformer with global attention to jointly encode the query node and the subgraph. A learnable sparse-routing codebook further compresses latent embeddings into compact bases, improving robustness.
Extensive experiments on standard multimodal recommendation benchmarks show that GRE-MC consistently outperforms previous state-of-the-art methods. The key innovation is the retrieval mechanism: by pulling in distant but relevant context from the whole graph, the model captures subtle patterns that neighbor-only approaches miss. The sparse-routing codebook also prevents overfitting when large portions of modality data are missing. For engineers building production recommendation systems, this means higher accuracy and reliability under real-world data corruption or privacy restrictions. The paper is available on arXiv and has already prompted interest for integrations into graph-based personalization pipelines.
- GRE-MC uses a modality-aware subgraph retrieval mechanism to pull semantically relevant context from the entire graph, not just local neighbors.
- A graph transformer with global attention encodes the query node and retrieved subgraph to reconstruct missing features.
- A learnable sparse-routing codebook regularizes latent embeddings into compact bases, improving robustness against missing data.
Why It Matters
Makes AI recommendations reliable even when cameras fail or privacy rules hide data.