SmallCode coding agent hits 87% benchmarks using 4B local model
New open-source agent outperforms larger models with clever architecture tricks.
SmallCode is an open-source coding agent built from the ground up for small local models, achieving an impressive 87% benchmark success rate with just a 4B-parameter Gemma model. Its creator, frustrated by the poor performance of existing agents (OpenCode, Cursor, Claude Code) when run on local models, designed SmallCode to handle the common failure points: tool call crashes, context overflows, and multi-step task collapse. The result is a system that beats larger counterparts—for example, OpenCode scores only ~75% even with 14B models. The harness does the heavy lifting, not model size.
SmallCode's secret lies in five architectural tricks. Compound tools combine multiple sequential calls (e.g., find, read, edit, verify) into one, cutting failures by half. An improvement loop compiles/lints code instantly and feeds errors back, so the model only needs to fix mistakes, not get it right first try. On repeated failure, tasks decompose into smaller pieces. Token budgeting summarizes and truncates context intelligently, avoiding mid-code truncation. A code graph indexes symbols and relationships, returning only relevant code snippets. If a task still fails, SmallCode can auto-escalate to a cloud model for that subtask, keeping 95% of work local. The UI is a full-screen terminal with chat, command palette, and plugins. It does not yet include LSP integration, multi-session, or a desktop app. Install via 'npm install -g smallcode' and point at any OpenAI-compatible endpoint. MIT licensed on GitHub.
- Achieves 87% benchmark score using a 4B-parameter Gemma model, outperforming 14B-based agents.
- Uses compound tools and automatic improvement loops to reduce sequential call failures by half.
- Open-source (MIT), installable via npm, and integrates with local backends like Ollama and LM Studio.
Why It Matters
Enables powerful AI-assisted coding on consumer hardware, cutting cloud dependency while maintaining high accuracy.