CroCo enables multilingual AI preference tuning without language-specific data
A single English reward model now improves LLM outputs across 14 languages
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Mike Zhang, Ali Basirat, and Desmond Elliott introduce CroCo, extending contrastive preference tuning to 14 high and low-resource languages. Using an English-only reward model atop multilingual bases (EuroLLM-9B, Aya-3B), the method transfers without language-specific annotations. On-policy data is critical; off-policy reduces gains. Structured task performance matches/exceeds baselines in most languages, while open-ended generation wins across 11 evaluated languages.
- CroCo uses an English-only reward model on multilingual base models to tune preferences across 14 languages without language-specific annotations.
- On-policy data is essential; off-policy responses reduce benefits and online optimization fails to outperform the offline variant.
- On open-ended generation tasks, CroCo-tuned models win against base models in all 11 evaluated languages for both EuroLLM-9B and Aya-3B.
Why It Matters
Eliminates need for language-specific preference annotation, making multilingual LLM alignment cheaper and more scalable.