CroCo uses an English-only reward model on multilingual base models to tune preferences across 14 languages without language-specific annotations?

CroCo uses an English-only reward model on multilingual base models to tune preferences across 14 languages without language-specific annotations.

On-policy data is essential; off-policy responses reduce benefits and online optimization fails to outperform the offline variant?

On-policy data is essential; off-policy responses reduce benefits and online optimization fails to outperform the offline variant.

On open-ended generation tasks, CroCo-tuned models win against base models in all 11 evaluated languages for both EuroLLM-9B and Aya-3B?

On open-ended generation tasks, CroCo-tuned models win against base models in all 11 evaluated languages for both EuroLLM-9B and Aya-3B.

Research & Papers

CroCo enables multilingual AI preference tuning without language-specific data

arXiv cs.CL May 27, 2026

⚡A single English reward model now improves LLM outputs across 14 languages

Deep Dive

Mike Zhang, Ali Basirat, and Desmond Elliott introduce CroCo, extending contrastive preference tuning to 14 high and low-resource languages. Using an English-only reward model atop multilingual bases (EuroLLM-9B, Aya-3B), the method transfers without language-specific annotations. On-policy data is critical; off-policy reduces gains. Structured task performance matches/exceeds baselines in most languages, while open-ended generation wins across 11 evaluated languages.

Key Points

CroCo uses an English-only reward model on multilingual base models to tune preferences across 14 languages without language-specific annotations.
On-policy data is essential; off-policy responses reduce benefits and online optimization fails to outperform the offline variant.
On open-ended generation tasks, CroCo-tuned models win against base models in all 11 evaluated languages for both EuroLLM-9B and Aya-3B.

Why It Matters

Eliminates need for language-specific preference annotation, making multilingual LLM alignment cheaper and more scalable.

Read Original Article

CroCo enables multilingual AI preference tuning without language-specific data

Why It Matters

Related Articles

🚀 Stay Ahead in AI