Is Sama firing at Anthropic
Leaked benchmarks show Sama processes 1M tokens per second, outpacing GPT-4o and Claude 3.5.
Anthropic, the AI safety company behind Claude, is reportedly testing an internally codenamed model called 'Sama' that achieves unprecedented inference speeds. According to leaked benchmarks circulating on developer forums, Sama can process approximately 1 million tokens per second, which is roughly 10 times faster than OpenAI's GPT-4o and Anthropic's own Claude 3.5 Sonnet. This performance leap appears to come from a novel mixture-of-experts architecture optimized specifically for parallel token generation, potentially reducing latency for enterprise applications to milliseconds.
While raw speed is the headline, early evaluations suggest Sama doesn't sacrifice capability for velocity. The model reportedly scores around 85% on the MMLU benchmark, keeping it competitive with top-tier models on reasoning tasks. This combination could enable a new generation of real-time AI agents for trading, customer service, and interactive media. However, Anthropic hasn't officially confirmed Sama's existence, leaving the AI community speculating about a potential late-2024 release that could reshape the speed benchmark for large language models.
- Reported 1M tokens/second processing speed, 10x faster than GPT-4o
- Maintains ~85% MMLU score despite massive speed optimization
- Uses novel mixture-of-experts architecture for parallel generation
Why It Matters
Could enable real-time AI agents for finance, gaming, and customer service where latency is critical.