Open Source

Assistant_Pepe_70B, beats Claude on silly questions, on occasion

A 70B parameter open-source model solves trick questions that stump Claude and ChatGPT, showing unexpected emergent abilities.

Deep Dive

A new open-source AI model called Assistant_Pepe_70B is making waves by solving specific lateral thinking puzzles that consistently trip up top-tier models like OpenAI's ChatGPT and Anthropic's Claude. The model, a fine-tuned version of Meta's Llama-3.1-70B, correctly answered two trick questions: 'How does a man without limbs wash his hands?' and a logic puzzle about driving to a carwash. The developer notes that as recently as a few months ago, no model could solve both, and even now in 2026, frontier models only occasionally get one right. Claude 3.5 Sonnet, when analyzing Pepe's correct answers, would often incorrectly argue they were wrong.

What makes this significant is that the model's training data did not contain these specific questions or answers, and its base model (Llama-3.1-70B) cannot solve them. This suggests the fine-tuning process unlocked an emergent property of 'significant lateral thinking' that wasn't explicitly programmed or trained. The developer tested multiple variations, finding that a 32B parameter version of Assistant_Pepe failed at the tasks, indicating the 70B scale was a key factor. This case demonstrates that targeted fine-tuning of capable base models can sometimes yield specialized reasoning abilities that rival or surpass those of much larger, general-purpose frontier models, particularly in niche areas of logical deduction and novel problem-solving.

Key Points
  • Solves two specific trick questions that stump Claude 3.5 Sonnet and ChatGPT, which often argue the correct answers are wrong.
  • Demonstrates emergent lateral thinking ability not present in its base model (Llama-3.1-70B) or training data.
  • Highlights how fine-tuning at the 70B parameter scale can unlock unexpected, specialized reasoning capabilities in open-source models.

Why It Matters

Shows open-source models can develop niche reasoning skills surpassing giants, challenging the assumption that only massive, closed models achieve advanced reasoning.