Research & Papers

RAG Enables AI to Generate Singlish with Minimal Edits

New technique achieves natural Singlish with just 1 word substitution per sentence.

Deep Dive

A team of researchers (Foong Ming Lai, Yujin Tan, Han Meng, Yi-Chieh Lee) has developed a retrieval-augmented generation (RAG) framework that converts Standard English into Singlish, the colloquial English-based creole spoken in Singapore. The key innovation is externalizing dialectal knowledge into a curated lexicon, enabling controlled lexical code-switching without the need for fine-tuning or large parallel datasets. The system retrieves candidate Singlish expressions and guides the language model through sparse lexical substitution—replacing only specific words like 'already' with 'liao' or adding particles like 'lah'—rather than rewriting entire sentences.

In a human evaluation with 164 Singaporean participants, the RAG approach was rated just as natural and appropriate as zero-shot prompting. However, automatic analysis revealed stark differences: zero-shot prompting averaged 23 token edits per sentence (heavy paraphrasing) with a cosine similarity of 0.926 to the original, while RAG performed a median of just 1 edit per sentence and achieved a cosine similarity of 0.978. This demonstrates that externalizing code-switching into lexical resources enables fine-grained control, auditability, and semantic preservation, making it practical for rapidly evolving contact varieties where new slang emerges frequently.

Key Points
  • RAG framework uses a curated lexicon to perform lexical code-switching without fine-tuning—median of just 1 edit per sentence.
  • Human evaluation with 164 Singaporeans rated RAG and zero-shot prompting equally natural, but RAG preserved higher semantic similarity (0.978 vs. 0.926 cosine).
  • Zero-shot prompting required heavy paraphrasing (median 23 token edits), while RAG enables controlled, auditable generation for evolving creoles.

Why It Matters

Enables AI to handle code-switched dialects like Singlish with precision, control, and cultural authenticity.