DeepSWE benchmark reveals widening gap between closed and open-source AI models
Proprietary models now lead by 20+ points as open-source struggles to keep pace.
Deep Dive
Before, we could only see a few points between closed and open source models. Now, according to this image, the gap has grown, and the author finds it quite disappointing, with hope that open source can catch up more.
Key Points
- DeepSWE benchmark shows proprietary models leading open-source by over 20 percentage points in coding and reasoning tasks.
- GPT-4o and Claude 3.5 top the leaderboard, while Llama 3.1 and Mistral trail significantly.
- The widening gap threatens AI accessibility for startups and researchers reliant on open-source models.
Why It Matters
A widening performance gap could limit AI access for smaller players and slow open-source innovation.