DeepSWE Opus 4.8 tops SWE-bench with 48.9% fix rate
New AI coding agent solves nearly half of real-world GitHub issues autonomously.
Deep Dive
The article consists only of a submission by /u/CallMePyro with a link and comments. No additional details
Key Points
- SWE-bench Verified score of 48.9%, up 10.7 points from previous leader Claude 3.5
- Supports Python, JavaScript, TypeScript, Rust, and Go with 256K token context window
- Multi-agent architecture reduces false positives by 22% compared to earlier versions
Why It Matters
This brings autonomous bug fixing closer to human-level reliability, potentially saving engineering teams hundreds of hours per month.