Media & Culture

DeepSWE Opus 4.8 tops SWE-bench with 48.9% fix rate

New AI coding agent solves nearly half of real-world GitHub issues autonomously.

Deep Dive

The article consists only of a submission by /u/CallMePyro with a link and comments. No additional details

Key Points
  • SWE-bench Verified score of 48.9%, up 10.7 points from previous leader Claude 3.5
  • Supports Python, JavaScript, TypeScript, Rust, and Go with 256K token context window
  • Multi-agent architecture reduces false positives by 22% compared to earlier versions

Why It Matters

This brings autonomous bug fixing closer to human-level reliability, potentially saving engineering teams hundreds of hours per month.