FOSTER distills text-based recommenders using only 20 synthetic sequences
New dataset distillation technique matches full-dataset accuracy with 20 synthetic interactions
Text-based sequential recommenders use item descriptions to improve personalization, but training them on large catalogs is computationally expensive due to language model encoding. To address this, a team of researchers (Hung Vinh Tran, Tong Chen, Xinyi Gao, Junliang Yu, Julien Monteil, Hongzhi Yin) introduced FOSTER, a dataset distillation framework that compresses a full dataset into a compact set of synthetic interaction sequences. Unlike traditional distillation that requires costly bi-level optimization, FOSTER achieves efficiency through three innovations: stochastic item subset sampling, which avoids full-corpus embedding extraction at each step; first-order optimization with trajectory-anchored parameter reset, which eliminates expensive second-order gradients; and a regularization term that forces semantically similar items to co-occur in synthetic sequences.
In experiments across three public benchmarks, FOSTER consistently outperformed both dataset distillation baselines and coreset selection methods. Remarkably, it approximated full-dataset performance using as few as 20 synthetic interaction sequences (e.g., 99.5% of the original NDCG@10 on one dataset). This represents a potential 1000x reduction in training data volume while maintaining recommendation quality. The stochastic sampling also makes the method scalable to datasets with millions of items. The code is expected to be released, and the paper is available on arXiv (2605.30772).
- Stochastic item subset sampling avoids costly full-corpus embedding extraction at each distillation step.
- First-order optimization with trajectory-anchored parameter reset eliminates expensive bi-level gradient computation.
- Regularization promotes co-occurrence between semantically similar items, enabling high performance with only 20 synthetic sequences.
Why It Matters
Reduces training data needs by orders of magnitude, enabling cheaper, faster recommender system development for large-scale deployments.