AI Safety

Estimating METR Time Horizons for Claude Opus 4.6 and GPT 5.3 Codex (xhigh)

LessWrong AI February 17, 2026

⚡The latest leaked benchmark scores reveal a surprising winner in the AI race...

Deep Dive

A new analysis estimates the METR time horizons for Claude Opus 4.6 and the rumored GPT 5.3 Codex, a key benchmark measuring how long tasks take a human expert that an AI can one-shot. The crowd-sourced prediction expects GPT 5.3 Codex to lead with an 8.7-hour horizon versus Opus's 7.9 hours. The methodology extends the Epoch Capabilities Index using agentic benchmarks like SWE-Bench Pro to model these critical performance metrics.

Why It Matters

These estimates are a leading indicator of which model will dominate complex, real-world software and reasoning tasks for developers.

Read Original Article

Estimating METR Time Horizons for Claude Opus 4.6 and GPT 5.3 Codex (xhigh)

Why It Matters

Stay Ahead in AI