Learning to Reflect and Correct: Towards Better Decoding Trajectories for Large-Scale Generative Recommendation
New AI framework adds a 'self-critique' step to recommendation models, lifting ad revenue by 1.79% in online tests.
A research team has introduced GRC (Generation-Reflection-Correction), a novel framework designed to solve a core flaw in modern Generative Recommendation (GR) systems. Current models generate recommendations in a single, error-prone pass, where early mistakes compound and degrade final output quality. GRC fundamentally restructures this process by adding explicit 'reflection' and 'correction' stages. After an initial draft generation, the model performs multi-granular reflection on its own output in the semantic token space, then executes a reflection-guided correction. This creates a more robust decoding trajectory.
To optimize this new, larger refinement space, the team uses GRPO-based reinforcement learning with a custom reward function combining token-level and trajectory-level signals. For practical deployment, they developed an Entropy-Guided Reflection Scheduling (EGRS) strategy, which smartly allocates more computational 'correction budget' to high-uncertainty decoding paths during beam search, maximizing efficiency. Extensive experiments on real-world datasets show GRC consistently beats six leading baselines by up to 15.74%. Most compellingly, online A/B tests in a large-scale industrial setting confirmed a significant 1.79% lift in advertising revenue, demonstrating that the improved recommendation quality directly translates to business value with only a modest increase in latency.
- GRC framework adds a structured 'self-reflection' step to AI recommendation models, improving output by up to 15.74% over baselines.
- Uses GRPO-based reinforcement learning and a novel Entropy-Guided Reflection Scheduling strategy for efficient online serving.
- Proven in live A/B tests, delivering a 1.79% increase in advertising revenue for large-scale industrial recommendation systems.
Why It Matters
This directly improves the accuracy and business value of AI-powered recommendations used by billions on platforms like TikTok and Amazon.