Comparative Analysis of Large Language Models in Generating Telugu Responses for Maternal Health Queries
New study finds Google's Gemini excels at providing accurate, coherent pregnancy advice in Telugu.
A new research paper titled "Comparative Analysis of Large Language Models in Generating Telugu Responses for Maternal Health Queries" provides a critical benchmark for AI performance in low-resource languages. The study, authored by Anagani Bhanusree, Sai Divya Vissamsetty, K VenkataKrishna Rao, and Rimjhim, evaluated OpenAI's ChatGPT-4o, Google's Gemini, and Perplexity AI using a bilingual dataset of pregnancy-related questions. Expert gynecologists rated responses on parameters like accuracy, fluency, relevance, coherence, and completeness, while the team also applied semantic similarity metrics like BERT Score for quantitative analysis.
The findings reveal a clear performance hierarchy. Google's Gemini emerged as the top performer, generating the most accurate and coherent pregnancy advice specifically in Telugu. Interestingly, Perplexity AI demonstrated strong capabilities when the initial prompts were given in Telugu, suggesting prompt language significantly impacts output quality. The study notes that ChatGPT-4o's performance in this domain has room for improvement. This research underscores a major gap in AI development: while LLMs show prowess in English, their utility in critical, real-world applications like healthcare for non-English speakers remains inconsistent and requires targeted enhancement.
- Gemini outperformed ChatGPT-4o and Perplexity AI in generating accurate and coherent Telugu responses for maternal health queries, as rated by expert gynecologists.
- The study used a bilingual dataset and combined expert assessment with BERT Score semantic similarity metrics to evaluate models on fluency, relevance, and completeness.
- Prompt language proved crucial; Perplexity AI performed well when queries were input in Telugu, highlighting the importance of localized interaction.
Why It Matters
This exposes a critical gap in global AI accessibility, showing that model performance for vital healthcare information varies drastically by language.