Research & Papers

TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds

New architecture solves 'Sequential Collapse' problem that plagued previous unified recommendation systems.

Deep Dive

A team of researchers from Tencent and collaborating institutions has introduced TokenFormer, a novel architecture designed to unify two historically separate branches of recommender systems. For years, the field has been split between models that excel at analyzing multi-field categorical features (like user demographics and product categories) and those specialized for sequential user behavior (like click histories). Previous attempts to merge these approaches within a single model often failed due to a problem the researchers identified as Sequential Collapse Propagation (SCP), where non-sequence data corrupts the quality of sequence representations.

TokenFormer tackles this core challenge with two key innovations. First, it employs a Bottom-Full-Top-Sliding (BFTS) attention scheme, which applies standard self-attention in early layers to capture broad context and then uses a shrinking-window sliding attention in upper layers to focus on local sequential patterns efficiently. Second, it introduces a Non-Linear Interaction Representation (NLIR) module, which applies one-sided non-linear transformations to hidden states to better model complex feature interactions without degrading sequence information.

The paper reports that extensive experiments on public benchmarks and, crucially, on Tencent's massive advertising platform demonstrate that TokenFormer achieves state-of-the-art performance. Detailed analysis confirms the model significantly improves the dimensional robustness and discriminative power of the learned representations under a unified framework. This breakthrough means platforms can potentially deploy a single, more powerful model instead of maintaining separate systems for different recommendation tasks, leading to more accurate and cohesive user predictions.

Key Points
  • Solves 'Sequential Collapse Propagation' (SCP), a failure mode where combining sequence and categorical data degrades model performance.
  • Introduces BFTS attention (full attention in lower layers, sliding window in upper layers) and NLIR (Non-Linear Interaction Representation).
  • Demonstrates state-of-the-art results on public benchmarks and Tencent's ad platform, improving representation quality and robustness.

Why It Matters

Enables more accurate, unified recommender systems for major platforms like Tencent, improving ad targeting and content discovery at scale.