AI Safety

Closed AI Models Are Months Ahead of Open-Weight — New Metric Reveals the Gap Is Actually Growing

Epoch's ECI shows open-weight models trail the frontier by several months.

Deep Dive

RobinHa analyzed open-weight AI model capability using Epoch's ECI (Elastic Capability Index), which applies IRT (Item Response Theory) to benchmark scores. This approach improves on the simpler AA Index from a viral Twitter post, which collapsed at the end due to lack of logistic assumptions. The resulting interactive graph plots each open-weight frontier model against the date a closed model first achieved the same ECI score, revealing a consistent lag of several months. A second graph shows the running best ECI over time for open vs. closed models, confirming that closed models have maintained a persistent lead, though the exact gap varies.

Notably, GLM-5.2 has not yet been scored, so its impact remains unknown. The analysis also generalizes to other comparisons, such as the OpenAI vs. Anthropic rivalry, plotting how many months ahead (or behind) each OpenAI frontier model was at a given ECI. Commenters on LessWrong praised the tool's potential for tracking Chinese versus American progress, as well as distinguishing trustworthy from untrustworthy companies. Open-weight models from companies like DeepSeek were flagged for operating at lower security standards, raising concerns about rogue replication.

Key Points
  • ECI uses IRT for more accurate capability measurement than the AA Index.
  • Open-weight frontier models typically trail closed models by several months in ECI score.
  • GLM-5.2 remains unscored, and the tool can also track rivalries like OpenAI vs. Anthropic.

Why It Matters

Quantifying the open-weight lag helps gauge how quickly AI capabilities diffuse beyond frontier labs.

📬 Get the top 10 AI stories daily