Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation
A self-evolving AI that improves recommendation quality as it learns, beating static methods
Large Language Models (LLMs) are increasingly used in recommender systems (LLM4Rec) to model complex user preferences, but they struggle with list-level, non-differentiable metrics like NDCG and fairness. Existing Best-of-N (BoN) methods optimize these metrics during inference but are computationally expensive. BoN Alignment distills search capability into the model, yet suffers from two limitations: indiscriminate supervision (static references fail to rank better candidates) and gradient decay (supervision signals fade as the policy improves). This makes optimization inefficient and limits performance.
To address this, researchers from Zhejiang University and other institutions propose BLADE (Bayesian List-wise Alignment via Dynamic Estimation). BLADE introduces a Bayesian framework that fuses historical priors with dynamic evidence from current model rollouts, creating a self-evolving target distribution that adapts as the model improves. This keeps training signals informative throughout learning. Extensive experiments on three real-world datasets show BLADE significantly outperforms state-of-the-art baselines, achieving sustained gains in ranking accuracy (Recall, NDCG) and complex list-wise metrics (fairness, diversity). Code is available on GitHub. The work is accepted at SIGIR 2026.
- BLADE uses a Bayesian framework to continuously update its target distribution by combining historical priors with dynamic rollout evidence
- Solves two key problems in static Best-of-N alignment: indiscriminate supervision (loss of ranking guidance) and gradient decay (vanishing supervision signals)
- Outperforms state-of-the-art baselines on three datasets for Recall, NDCG, fairness, and diversity metrics
Why It Matters
Real-world recommendation systems can now deliver more accurate and fair results without costly inference-time search