Large Language Models for Multilingual Code Intelligence: A Survey
Most AI coding tools fail on low-resource languages like Rust and OCaml.
A comprehensive survey from researchers including Chao Jiang and colleagues at multiple institutions examines the state of large language models (LLMs) in multilingual code intelligence. The paper, published on arXiv, finds that current LLMs exhibit a significant performance bias toward high-resource languages like Python, while struggling with lower-resource languages such as Rust, OCaml, and others. This imbalance undermines the utility of AI-assisted software engineering for real-world systems, which are inherently polyglot.
The survey focuses on two core tasks: multilingual code generation from shared natural-language requirements, and multilingual code translation that preserves semantics across languages. It systematically reviews representative methods, benchmarks, and evaluation metrics for these tasks, highlighting key challenges such as data scarcity for low-resource languages, semantic preservation during translation, and trustworthy cross-language generalization. The authors identify opportunities for future research, including better multilingual training datasets, more robust evaluation frameworks, and techniques to improve LLM performance on underrepresented programming languages.
- LLMs show significant performance bias toward high-resource languages like Python, with weaker performance on Rust and OCaml.
- Survey covers two core tasks: multilingual code generation from natural-language requirements and multilingual code translation.
- Identifies challenges in data scarcity, semantic preservation, and trustworthy cross-language generalization for polyglot systems.
Why It Matters
Highlights critical gaps in AI code tools for polyglot systems, urging better multilingual support.