Grok 4.3 underperforms Grok 4.20 0309 on the Extended NYT Connections Benchmark, dropping from 93.4 to 67.5, though it achieves this result at a lower cost than the earlier Grok 4.20 run
New model scores 67.5 vs previous 93.4, yet offers significant cost savings.
Deep Dive
A GitHub repository for the NYT Connections benchmark has been shared on Reddit.
Key Points
- Grok 4.3 scored 67.5 on the Extended NYT Connections Benchmark, down from Grok 4.20's 93.4.
- The newer model operates at a lower computational cost than its predecessor.
- Benchmark results published on GitHub by lechmazur, shared on Reddit by /u/zero0_one1.
Why It Matters
Shows that cost-efficiency gains can come at the expense of reasoning performance in AI models.