Models & Releases

Is Sama firing at Anthropic

Leaked benchmarks show Sama processes 1M tokens per second, outpacing GPT-4o and Claude 3.5.

Deep Dive

Anthropic, the AI safety company behind Claude, is reportedly testing an internally codenamed model called 'Sama' that achieves unprecedented inference speeds. According to leaked benchmarks circulating on developer forums, Sama can process approximately 1 million tokens per second, which is roughly 10 times faster than OpenAI's GPT-4o and Anthropic's own Claude 3.5 Sonnet. This performance leap appears to come from a novel mixture-of-experts architecture optimized specifically for parallel token generation, potentially reducing latency for enterprise applications to milliseconds.

While raw speed is the headline, early evaluations suggest Sama doesn't sacrifice capability for velocity. The model reportedly scores around 85% on the MMLU benchmark, keeping it competitive with top-tier models on reasoning tasks. This combination could enable a new generation of real-time AI agents for trading, customer service, and interactive media. However, Anthropic hasn't officially confirmed Sama's existence, leaving the AI community speculating about a potential late-2024 release that could reshape the speed benchmark for large language models.

Key Points
  • Reported 1M tokens/second processing speed, 10x faster than GPT-4o
  • Maintains ~85% MMLU score despite massive speed optimization
  • Uses novel mixture-of-experts architecture for parallel generation

Why It Matters

Could enable real-time AI agents for finance, gaming, and customer service where latency is critical.