ChatGPT and Gemini scored higher than DeepSeek when evaluated as teaching agents for C programming?

ChatGPT and Gemini scored higher than DeepSeek when evaluated as teaching agents for C programming.

LLM performance was most variable and prompt-sensitive when using the complex Socratic Method teaching strategy?

LLM performance was most variable and prompt-sensitive when using the complex Socratic Method teaching strategy.

Six human judges evaluated the models across three pedagogical strategies?

Examples, Explanations/Analogies, and Socratic dialogue.

AI Safety

Study: ChatGPT and Gemini Outperform DeepSeek as AI Teaching Partners

arXiv cs.CY March 31, 2026

⚡New research pits ChatGPT, Gemini, and DeepSeek against three classic teaching strategies for programming.

Deep Dive

A new study from researchers at the University of São Paulo provides one of the first empirical comparisons of popular Large Language Models (LLMs) acting as teaching partners. The team evaluated OpenAI's ChatGPT, Google's Gemini, and 01.AI's DeepSeek across three established pedagogical strategies: providing Examples, crafting Explanations and Analogies, and employing the Socratic Method. The evaluation was conducted in the specific context of teaching introductory C programming, with six human judges scoring the AI-generated interactions.

Results revealed that ChatGPT and Gemini consistently outperformed DeepSeek across most criteria. The models showed similar interaction patterns for the simpler strategies of providing Examples and Explanations. However, their performance diverged significantly with the more complex Socratic Method, where the models proved highly sensitive to the initial prompt's phrasing and structure. This suggests that while LLMs can mimic basic instructional techniques, their ability to guide a student through probing, open-ended dialogue varies considerably and depends heavily on user input.

The findings offer a crucial reality check on the promise of AI tutors. They confirm that not all leading models are equally adept at pedagogy, and their effectiveness is not uniform across different teaching styles. For educators and edtech developers, this underscores the importance of model selection and prompt engineering, especially for advanced interactive teaching methods. The research establishes a formal evaluation protocol that can be used to benchmark future AI teaching agents.

Key Points

ChatGPT and Gemini scored higher than DeepSeek when evaluated as teaching agents for C programming.
LLM performance was most variable and prompt-sensitive when using the complex Socratic Method teaching strategy.
Six human judges evaluated the models across three pedagogical strategies: Examples, Explanations/Analogies, and Socratic dialogue.

Why It Matters

Provides a data-driven framework for selecting and prompting AI models in educational settings, moving beyond hype.

Read Original Article

Study: ChatGPT and Gemini Outperform DeepSeek as AI Teaching Partners

Why It Matters

Related Articles

🚀 Stay Ahead in AI