gemma-4-26B-A4B with my coding agent Kon
Open-source coding agent works with local AI models and major providers, requiring under 270 tokens.
Developer 0xku has launched 'Kon,' a new open-source coding agent designed to run efficiently on local hardware with models like Google's Gemma-4-26B-A4B and Qwen3.5-27B. The project, hosted on GitHub, distinguishes itself with a remarkably small system prompt of under 270 tokens and a strict no-telemetry policy, appealing to developers concerned with privacy and data control. It's built as a simple codebase of under 150 files and is compatible with a wide range of model providers, including local setups via llama-server and major cloud APIs from OpenAI, Anthropic, and Azure.
Kon is not a minimal prototype but a fully-featured coding assistant. It supports core agent functionalities like file attachments, slash commands, a skill library defined in an AGENTS.md file, and session forking for handoffs between different models. The developer has tested it extensively on local hardware, such as an RTX 3090 GPU, confirming performance with quantized GGUF model formats. This positions Kon as a practical, private alternative to commercial coding copilots, giving developers full ownership over their AI-assisted workflow without relying on external servers.
- Runs locally on models like Gemma-4-26B-A4B and Qwen3.5-27B-GGUF, tested on an RTX 3090.
- Features a tiny 270-token system prompt and a strict no-telemetry, open-source codebase (<150 files).
- Supports full agent features: attachments, commands, skill libraries, model switching, and session forking/handoff.
Why It Matters
Provides developers a private, locally-hosted alternative to cloud coding assistants, ensuring data never leaves their machine.