Open Source

Ran Qwen 3.5 9B on M1 Pro (16GB) as an actual agent, not just a chat demo. Honest results.

r/LocalLLaMA March 06, 2026

⚡A developer ran real automation tasks locally, proving small models can handle practical agent work.

Deep Dive

Developer Joozio conducted a practical experiment running Alibaba's Qwen 3.5 9B model as a functional agent on a consumer 16GB M1 Pro MacBook, moving beyond simple chat demos. Using Ollama to provide an OpenAI-compatible API, he integrated the local model into his existing Claude Code-based automation system with a one-line configuration change. The test involved executing real tasks from his personal queue, focusing on whether a smaller, local model could handle a meaningful subset of agentic work without relying on cloud APIs like Claude Opus. The core finding is a shift in perspective: local AI isn't about competing with frontier models, but about identifying which agent tasks—often simple file reading, formatting, and routing—can run efficiently and privately on-device.

The Qwen 3.5 9B model performed reasonably well on straightforward agent functions: memory recall (reading and surfacing context from structured files) worked correctly, and basic tool calling was reliable for simple requests. While a noticeable gap existed in creative and complex reasoning compared to larger models, the speed was acceptable for the targeted use case. Notably, Joozio also highlighted the frontier of on-device AI by running even smaller Qwen models (0.8B and 2B) entirely offline on an iPhone 17 Pro using the open-source PocketPal AI app. This experiment validates a growing trend: leveraging cost-effective, private local inference for specific automation pipelines, reducing dependency on per-token cloud costs and data transmission for suitable tasks.

Key Points

Qwen 3.5 9B ran real automation tasks on a 16GB M1 Pro MacBook via Ollama's OpenAI-compatible API.
The model successfully handled memory recall and basic tool calling, proving viable for simple agent work.
Tiny models (Qwen 0.8B/2B) also ran fully offline on an iPhone 17 Pro, showcasing on-device AI maturity.

Why It Matters

Enables cost-effective, private automation on consumer hardware, reducing reliance on cloud APIs for simple agent tasks.

Read Original Article

Ran Qwen 3.5 9B on M1 Pro (16GB) as an actual agent, not just a chat demo. Honest results.

Why It Matters

Stay Ahead in AI