Qwen3.5-35B-A3B running on a Raspberry Pi 5 (16GB and 8GB variants)
A developer runs a 35-billion-parameter AI model on a $80 Raspberry Pi 5, achieving usable speeds for agentic tasks.
In a significant demonstration of edge AI capability, a developer has successfully deployed Alibaba's Qwen3.5-35B-A3B, a 35-billion-parameter large language model, on a standard Raspberry Pi 5. The project, shared on Reddit, used custom llama.cpp builds and aggressive 2-bit quantization to fit the massive model into the Pi's limited RAM—16GB and 8GB variants. Initial results show the system generating over 3 tokens per second on the 16GB Pi and 1.5 t/s on the 8GB model, despite using only SD cards for storage and dealing with thermal throttling due to inadequate cooling. This performance is notable as it approaches the speed of smaller 4-bit quantized models like Qwen3-4B-VL, but with the reasoning power of a model nearly nine times larger.
The technical achievement highlights the rapid efficiency gains in model compression and inference engines. The developer used 2-bit quantization, a technique that drastically reduces model size and memory requirements, making it feasible to run on sub-$100 hardware. The next steps involve installing an SSD HAT and a better cooler to mitigate throttling and further experiment with ARM's KleidiAI software stack. This breakthrough has immediate implications for developing low-cost, local AI agents for education, personal assistants, and privacy-sensitive tasks that cannot rely on cloud APIs, effectively democratizing access to advanced AI capabilities.
- Alibaba's 35B-parameter Qwen3.5-A3B model runs on a Raspberry Pi 5 using 2-bit quantization and llama.cpp
- Achieves 3+ tokens/sec on 16GB Pi and 1.5+ t/s on 8GB Pi, rivaling speeds of much smaller 4B models
- Demonstrates feasibility of powerful local AI agents on sub-$100 edge hardware for education and privacy-focused apps
Why It Matters
Enables powerful, private AI assistants and educational tools on ultra-cheap, local hardware, reducing cloud dependency and cost.