SCALR translates source domain user events into synthetic target domain interactions by estimating interaction likelihood conditioned on source behavior?

SCALR translates source domain user events into synthetic target domain interactions by estimating interaction likelihood conditioned on source behavior.

Two-stage modular approach?

event generation then model-agnostic training on synthetic data.

Research & Papers

SCALR: Synthetic Data from Cross-Domain Events Boosts Recommendation Systems

arXiv cs.IR June 02, 2026

⚡New framework generates synthetic user-item interactions to fight data sparsity in recommendations.

Deep Dive

Large-scale recommendation systems operating across diverse domains face persistent challenges of data sparsity and noisy implicit feedback. Traditional approaches rely on model-specific knowledge distillation from source to target domains, but these often struggle to generalize. In a new paper, researchers from an industrial team introduce SCALR (Synthetic Cross-domain Augmentation and Learning for Recommendation), a framework inspired by the transformative success of synthetic data generation in large language models. SCALR generates synthetic user-item interaction events for a target domain by leveraging observed events from a source domain, effectively translating cross-domain behavior into training data.

SCALR decomposes cross-domain learning into two modular stages. First, it translates observed user events in source domains by framing event generation as estimating the probability that a user would interact with a target-domain item, conditioned on their source-domain interactions. Second, downstream recommendation models train on these synthetic events in a model-agnostic manner, augmenting the target domain's training data. The researchers report statistically significant improvements in online A/B tests on an industrial recommendation platform. To their knowledge, this is among the first works to explicitly frame cross-domain event transfer as synthetic data generation for recommendation systems, opening a new avenue for tackling data scarcity without requiring complex model-specific adaptations.

Key Points

SCALR translates source domain user events into synthetic target domain interactions by estimating interaction likelihood conditioned on source behavior.
Two-stage modular approach: event generation then model-agnostic training on synthetic data.
Achieved statistically significant improvements in online A/B tests on an industrial recommendation platform.

Why It Matters

Tackles data sparsity in large-scale recommender systems using LLM-inspired synthetic data generation.

Read Original Article

SCALR: Synthetic Data from Cross-Domain Events Boosts Recommendation Systems

Why It Matters

Related Articles

🚀 Stay Ahead in AI