Media & Culture

Grok 4.3 underperforms Grok 4.20 0309 on the Extended NYT Connections Benchmark, dropping from 93.4 to 67.5, though it achieves this result at a lower cost than the earlier Grok 4.20 run

New model scores 67.5 vs previous 93.4, yet offers significant cost savings.

Deep Dive

A GitHub repository for the NYT Connections benchmark has been shared on Reddit.

Key Points
  • Grok 4.3 scored 67.5 on the Extended NYT Connections Benchmark, down from Grok 4.20's 93.4.
  • The newer model operates at a lower computational cost than its predecessor.
  • Benchmark results published on GitHub by lechmazur, shared on Reddit by /u/zero0_one1.

Why It Matters

Shows that cost-efficiency gains can come at the expense of reasoning performance in AI models.