Qwen3 4B outperforms cloud agents on code tasks—with Mahoraga research [R]
Qwen3 4B outperforms cloud agents on code tasks at 33.8 t/s
A developer known as pockanoodles has released Mahoraga, an open-source AI orchestrator that intelligently routes tasks across local and cloud models using a contextual bandit algorithm (LinUCB). Built out of necessity when the developer ran out of cloud credits, Mahoraga learns from every decision to optimize cost and quality. The system runs entirely on a 16GB MacBook Pro (M-series, Nov 2024) and handles tasks through a two-stage routing pipeline: first, a keyword classifier buckets the task into categories like code, plan, or research, then the bandit selects the best agent within that bucket using a 9-dimensional context vector.
In a benchmark of 192 code generation tasks across 8 agents (4 local Ollama models, 4 cloud CLIs), Qwen3 4B in nothink mode emerged as the standout performer, achieving 33.8 tokens/second with just 6.1 seconds average latency. It measurably outperformed cloud agents on code and refactoring tasks, which clustered around a 0.650 quality score. Other local models showed trade-offs: LFM2 hit 77.1 t/s but sacrificed ~5 quality points, while DeepSeek-R1 averaged 123.5 seconds per task on 16GB hardware, making it impractical as a default. Security scoring was flat due to a human error in the heuristic system. The bandit routing demonstrated sublinear regret (β=0.659) across 200-task simulations, proving it converges efficiently. Cloud escalation only fires on retry, keeping costs near-zero for well-matched tasks.
- Qwen3 4B beats cloud agents on code tasks with 33.8 t/s and 6.1s latency
- Mahoraga uses LinUCB bandit with 9D context vector for sublinear regret (β=0.659)
- Zero-cost evaluation via 4-layer heuristic (novelty, structure, embedding, length)
Why It Matters
Enables cost-free, high-quality AI task routing on consumer hardware, challenging cloud dependency.