Open Source

Local Claude Code with Qwen3.5 27B

A developer bypassed Claude Code's cloud dependency, running it fully offline with a 27B local model.

Deep Dive

A developer has engineered a method to run Anthropic's Claude Code extension in a fully offline, private environment. By intercepting its API calls and redirecting them to a local llama.cpp server, they bypassed the tool's cloud dependency. The setup uses the Qwen3.5-27B model quantized to Q4_K_M, running on an AMD Strix Halo system with specialized ROCm libraries for GPU acceleration. Environment variables and configuration files were modified to disable all telemetry, auto-updaters, and non-essential traffic, creating a truly local coding assistant.

Performance testing across seven coding tasks revealed key insights. Generation speed degraded by about 24% as context grew, from 9.71 tokens/second at 23K context to 7.42 t/s near the 65K limit. The system prompt itself consumes 22,870 tokens (35% of the budget). A major limitation was hitting a hard context wall at 65,535 tokens, as Claude Code's internal auto-compaction logic assumed a 200K window. The /compact command also failed due to a 4096-token output limit being too small for summarization. While code quality was rated 7-8.5/10 and tool chaining worked, web search functionality is completely broken without Anthropic's backend.

Key Points
  • Successfully routed Claude Code API to a local llama.cpp server running Qwen3.5 27B (Q4_K_M quant)
  • Disabled all telemetry and cloud features, but hit a 65K context wall and lost web search
  • Measured a 24% performance drop (9.71 to 7.42 t/s) as context increased and system prompt used 35% of budget

Why It Matters

This proves complex AI coding tools can be run privately offline, though with significant trade-offs in context and features.