ViPO: Visual Preference Optimization at Scale
New dataset and algorithm tackle noisy preferences for better AI generations
A team of researchers has introduced ViPO (Visual Preference Optimization at Scale), a massive preference dataset designed to address data bottlenecks in visual AI training. The dataset includes 1M image pairs at 1024px resolution across five categories and 300K video pairs at 720p+ across three categories. To tackle noisy preference signals in existing datasets, the team also developed Poly-DPO, an extension of the standard DPO (Direct Preference Optimization) algorithm. Poly-DPO adds a polynomial term that dynamically adjusts model confidence based on dataset characteristics, enabling robust learning even when preference labels conflict.
In tests on the noisy Pick-a-Pic V2 dataset, Poly-DPO outperformed Diffusion-DPO by 6.87 points on GenEval for SD1.5 and 2.32 points for SDXL. Interestingly, when applied to the high-quality ViPO dataset, Poly-DPO's optimal configuration converged to standard DPO, showing that sophisticated optimization is unnecessary when data is clean. This finding underscores the importance of both algorithmic adaptability and data quality for scaling visual preference optimization. The project page and code are available online.
- ViPO dataset includes 1M image pairs at 1024px and 300K video pairs at 720p+
- Poly-DPO algorithm improved GenEval scores by 6.87 (SD1.5) and 2.32 (SDXL) on noisy data
- High-quality data made advanced optimization unnecessary, converging to standard DPO
Why It Matters
Scaling visual AI training requires both clean data and adaptive algorithms—ViPO and Poly-DPO deliver both.