Models & Releases

Improving OpenAI Codex with Repo-Specific Context

Adding structured git history context to OpenAI Codex improved task resolution by up to 5.3 percentage points.

Deep Dive

The team behind Codeset has demonstrated that their method for providing AI coding assistants with structured repository context works effectively across different models. After previously showing a 7–10 percentage point improvement for Claude Code, they applied the same evaluation to OpenAI Codex (GPT-5.4). The results were consistent: on their proprietary codeset-gym-python benchmark of 150 tasks, Codex's task resolution rate increased from 60.7% to 66%, a gain of 5.3 percentage points. On the broader SWE-Bench Pro benchmark with 400 randomly sampled tasks, performance improved from 56.5% to 58.5%.

Codeset's approach is infrastructure-light. Instead of using retrieval-augmented generation (RAG) or vector databases at query time, it runs a one-time pipeline over a repository's git history. This pipeline generates static files containing crucial context like past bugs per file with their root causes, known pitfalls, co-change relationships, and test checklists. These files live directly in the repo, and an AI agent simply reads them as part of its normal context window during development tasks. The company offers this as a one-time service for $5 per repository, with a free trial available, positioning it as a simple, effective way to boost AI coding assistant accuracy.

Key Points
  • OpenAI Codex (GPT-5.4) performance improved by 5.3 percentage points on a 150-task benchmark using Codeset's context.
  • The method generates static files from git history (bugs, test checklists) for AI agents to read, avoiding complex RAG systems.
  • Results mirror earlier 7–10pp gains with Claude Code, validating the approach's effectiveness across multiple AI models.

Why It Matters

This provides a simple, model-agnostic way to significantly improve AI coding assistants' accuracy and context-awareness for real-world software projects.