ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation
New framework turns wasted rollout data into learnable policy updates for sparse-hit recommendations.
A team of researchers from Alibaba and multiple universities has introduced ReCast, a novel framework that rethinks how reinforcement learning (RL) signals are constructed for generative recommendation systems. The paper, submitted to arXiv on April 24, 2026, identifies a critical flaw in generic group-based RL: when applied to sparse-hit generative recommendation, many sampled rollout groups contain all-zero or single-hit signals that are never learnable. ReCast addresses this with a two-stage approach—first repairing these degenerate groups to restore minimal learnability, then replacing traditional full-group reward normalization with a boundary-focused contrastive update that targets the strongest positive and hardest negative examples. This leaves the outer RL framework unchanged while modifying only within-group signal construction.
The results are striking. Across multiple generative recommendation tasks, ReCast consistently outperforms the OpenOneRec-RL baseline, achieving up to 36.6% relative improvement in Pass@1. Its matched-budget advantage is even more dramatic: ReCast reaches the baseline's target performance using just 4.1% of the rollout budget, and this advantage grows with model scale. Beyond accuracy, ReCast delivers direct system-level gains, including a 16.60x reduction in actor-side update time, a 16.5% drop in peak allocated memory, and a 14.2% improvement in model flop utilization (MFU). The authors conclude that for generative recommendation, the decisive RL problem is not just how to assign rewards, but how to construct learnable optimization events from sparse, structured supervision.
- ReCast achieves up to 36.6% relative improvement in Pass@1 over OpenOneRec-RL across generative recommendation tasks
- ReCast matches baseline performance using only 4.1% of the rollout budget, with advantage widening as model scale increases
- System-level gains include 16.60x faster actor-side updates, 16.5% lower peak memory, and 14.2% higher MFU
Why It Matters
ReCast makes RL practical for sparse recommendation systems, dramatically cutting compute costs while boosting accuracy.