AI Safety

AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

AI Alignment Forum April 07, 2026

⚡New analysis shows AI can now handle massive, verifiable software tasks that would take humans years.

Deep Dive

A prominent AI forecaster has published a significant update, moving towards "substantially shorter AI timelines" based on recent breakthroughs. The core update is a near doubling of the estimated probability for full AI R&D automation by the end of 2028, from around 15% to just under 30%. Furthermore, the analysis predicts that by the end of 2026, AI systems will achieve 50% reliability on completing massive, easy-to-verify software engineering (ES) tasks—projects that would take human engineers years or even decades. This performance is specifically noted for tasks that don't require novel ideation (ESNI tasks), relying instead on existing knowledge and rigorous verification.

The forecast revision is driven by several concrete demonstrations. The performance of models like Anthropic's Claude Opus 4.5 and 4.6, and OpenAI's Codex 5.2 series, has consistently exceeded expectations on benchmarks. More compellingly, the researcher cites specific, non-contaminated projects where AI, with only "moderately sophisticated scaffolding," autonomously completed work of a scale previously thought impossible in the short term. Examples include an almost entirely AI-written C compiler and significant cybersecurity results. These demos suggest the current "50% reliability time-horizon" for such massive ESNI tasks, using public models, is already between one month and several years when cost is equated to human labor. The researcher also anticipates a major training compute scale-up in 2026, which is expected to accelerate progress further.

Key Points

Probability of full AI R&D automation by EOY 2028 nearly doubled from ~15% to ~30%.
Predicts by EOY 2026, AIs will have 50% reliability on software tasks equivalent to years/decades of human work.
Update driven by outperforming models (Claude Opus 4.5/4.6, Codex 5.2) and demos like an AI-written C compiler.

Why It Matters

This suggests AI could automate complex, large-scale software development years earlier than many experts anticipated, reshaping tech labor.

Read Original Article

AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

Why It Matters

Stay Ahead in AI