SCALR: Synthetic Data from Cross-Domain Events Boosts Recommendation Systems
New framework generates synthetic user-item interactions to fight data sparsity in recommendations.
Large-scale recommendation systems operating across diverse domains face persistent challenges of data sparsity and noisy implicit feedback. Traditional approaches rely on model-specific knowledge distillation from source to target domains, but these often struggle to generalize. In a new paper, researchers from an industrial team introduce SCALR (Synthetic Cross-domain Augmentation and Learning for Recommendation), a framework inspired by the transformative success of synthetic data generation in large language models. SCALR generates synthetic user-item interaction events for a target domain by leveraging observed events from a source domain, effectively translating cross-domain behavior into training data.
SCALR decomposes cross-domain learning into two modular stages. First, it translates observed user events in source domains by framing event generation as estimating the probability that a user would interact with a target-domain item, conditioned on their source-domain interactions. Second, downstream recommendation models train on these synthetic events in a model-agnostic manner, augmenting the target domain's training data. The researchers report statistically significant improvements in online A/B tests on an industrial recommendation platform. To their knowledge, this is among the first works to explicitly frame cross-domain event transfer as synthetic data generation for recommendation systems, opening a new avenue for tackling data scarcity without requiring complex model-specific adaptations.
- SCALR translates source domain user events into synthetic target domain interactions by estimating interaction likelihood conditioned on source behavior.
- Two-stage modular approach: event generation then model-agnostic training on synthetic data.
- Achieved statistically significant improvements in online A/B tests on an industrial recommendation platform.
Why It Matters
Tackles data sparsity in large-scale recommender systems using LLM-inspired synthetic data generation.