Open Source

I'm done with using local LLMs for coding

Claude Code crushes Qwen 27B and Gemma 4 31B in real-world tests...

Deep Dive

A developer detailed their frustrations with local LLMs for coding tasks after spending weeks testing Qwen 27B and Gemma 4 31B, considered top-tier local models. The main pain point: poor decision-making and tool-call reliability. For example, when asked to Dockerize a GitHub repo, the local models failed to handle a long-running 'docker build' command, assuming it failed and inventing false causes (like 'torchcodec') instead of checking the output. Even with a custom AGENTS.md file instructing the model to pipe output to a file and use subagents, the LLM repeatedly read all Docker output, bloating sessions to 250k input tokens.

Performance was another major letdown. The developer noted slow inference speeds and frequent broken prompt caches, leading to long pauses with no output—especially frustrating with Claude Code, which doesn't stream the LLM's output. Ultimately, the developer concluded that local models offer no learning advantage over cloud ones, just more grief. The verdict: for professional coding, the productivity loss isn't worth the privacy benefits of local models.

Key Points
  • Local models (Qwen 27B, Gemma 4 31B) struggle with basic tool-calls like Docker builds, often inventing false error causes
  • Prompt caches frequently break, causing long pauses and bloated sessions up to 250k input tokens
  • Developer found no learning benefit—just more frustration compared to cloud models like Claude Code

Why It Matters

Highlights the gap between local and cloud LLMs for professional coding tasks, impacting tool adoption.