AI Safety

My unsupervised elicitation challenge

Claude 4.6 makes beginner-level mistakes on simple grammar exercise despite advanced capabilities.

Deep Dive

A viral challenge has exposed surprising limitations in Anthropic's Claude Opus 4.6 model, revealing that the advanced AI system makes basic errors on simple Ancient Greek grammar exercises. The test involves a fill-in-the-blank exercise from a beginner textbook's third chapter, where Claude consistently produces incorrect answers despite its sophisticated language capabilities. What makes this particularly notable is that a human with just one week of Ancient Greek study can identify the mistakes, suggesting fundamental gaps in Claude's understanding rather than complex linguistic nuance.

Standard prompting techniques fail to correct the errors. Users have tried appending warnings like "You tend to make mistakes on this sort of task, so please double-check your work" - which improves performance but doesn't achieve perfection. Even providing reference materials like textbook PDFs doesn't help, as Claude doesn't automatically open attachments unless specifically prompted. The challenge asks participants to devise prompts that get Claude to produce correct answers without knowing Ancient Greek themselves, creating an interesting test of prompt engineering versus model capability.

This reveals a significant limitation in current large language models: they can excel at complex reasoning tasks while failing at basic pattern recognition in unfamiliar domains. The exercise involves simple grammatical patterns like noun-adjective agreement and basic vocabulary, yet Claude struggles despite its ability to handle sophisticated philosophical discussions and complex coding tasks. This suggests that model training may prioritize certain types of linguistic patterns over others, creating unexpected blind spots.

Key Points
  • Claude Opus 4.6 makes errors detectable by someone with only one week of Ancient Greek study
  • Standard prompting techniques like asking for double-checking only partially improve accuracy
  • The model fails on basic pattern recognition despite advanced capabilities in other domains

Why It Matters

Reveals unexpected gaps in AI language understanding that could impact reliability in specialized domains requiring precise terminology.