NIST's CAISI evaluation says DeepSeek V4 is ~8 months behind GPT-5; Epoch AI puts the gap at 3–7 months?

NIST's CAISI evaluation says DeepSeek V4 is ~8 months behind GPT-5; Epoch AI puts the gap at 3–7 months.

Open-weight models (Gemma 4, Kimi K2.6, MiMo 2.5, GLM-5.1) offer self-hosted, cost-effective alternatives for developers?

Open-weight models (Gemma 4, Kimi K2.6, MiMo 2.5, GLM-5.1) offer self-hosted, cost-effective alternatives for developers.

Agentic performance—multi-step task execution—still likely favors closed models, making benchmark comparisons incomplete?

Agentic performance—multi-step task execution—still likely favors closed models, making benchmark comparisons incomplete.

Viral Wire

DeepSeek V4 trails US frontier models by 3-8 months, NIST and Epoch disagree

Perplexity Discover May 17, 2026

⚡Competing benchmarks show open-weight models nearly catching ChatGPT—but agentic tasks remain a question mark.

Deep Dive

May 2026 saw a record influx of open-weight AI releases—Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1 all dropped within weeks. Now, NIST has formally evaluated DeepSeek V4 using its CAISI benchmark, concluding the model lags U.S. frontier models by roughly 8 months, performing at the level of GPT-5 from last August. Meanwhile, Epoch AI's capabilities index measures the gap at just 3–7 months, sparking debate over which metric matters more. Both benchmarks use simplified setups that may not capture real-world agentic performance—where models chain dozens of steps and execute complex workflows—a strength closed models like ChatGPT and Claude still hold.

Beyond the raw numbers, this gap has become a strategic battleground. Closed-model providers tout the distance to justify subscription pricing, while open-source advocates argue the difference is shrinking fast. Open-weight models can be downloaded and run locally without per-token fees or data sent to third parties, making them increasingly viable for businesses. However, agentic tasks—where reliability over long multi-step processes is critical—remain an area where closed models likely keep an edge.

Separately, a growing legal risk emerged this month: employers citing "AI fluency" in hiring or firing decisions face age discrimination lawsuits. A Daily Journal analysis notes that similar phrases like "digital native" have already been ruled biased. In April 2026, 26% of 88,387 layoffs were blamed on AI, up 38% from March. The class action Mobley v. Workday claims AI hiring software screened out 1.1 billion applications from workers over 40. Illinois enacted a ban on AI-driven discrimination in January. These cases and evolving precedent suggest companies must offer training to older workers before citing AI skill gaps as grounds for termination.

Key Points

NIST's CAISI evaluation says DeepSeek V4 is ~8 months behind GPT-5; Epoch AI puts the gap at 3–7 months.
Open-weight models (Gemma 4, Kimi K2.6, MiMo 2.5, GLM-5.1) offer self-hosted, cost-effective alternatives for developers.
Agentic performance—multi-step task execution—still likely favors closed models, making benchmark comparisons incomplete.

Why It Matters

Open-weight models are closing in on frontier AI, cutting costs for businesses, while 'AI fluency' hiring bias creates new legal exposure.

Read Original Article

DeepSeek V4 trails US frontier models by 3-8 months, NIST and Epoch disagree

Why It Matters

Related Articles

Stay Ahead in AI