Open Source

Follow-up: Qwen3 30B a3b at 7-8 t/s on a Raspberry Pi 5 8GB (source included)

r/LocalLLaMA March 20, 2026

⚡A custom Debian image transforms a $80 Raspberry Pi into a local AI server with OpenAI-compatible API.

Deep Dive

Developer jslominski has released Potato OS, a specialized Debian image that transforms the affordable Raspberry Pi 5 into a surprisingly capable local AI inference server. The system achieves 7-8 tokens/second performance running the 30-billion parameter Qwen3-30B-A3B model, quantized to 2.66 bits per weight (Q3_K_S), with full 16,384 token context length. This performance comes from optimizations including a custom ik_llama.cpp build, prompt caching, and SSD storage, making it possible to run substantial language models on consumer hardware costing under $100.

After flashing the image, the system automatically downloads a 1.8GB Qwen3.5 2B vision model within 5 minutes, creating a ready-to-use AI endpoint at http://potato.local. The platform exposes a fully OpenAI-compatible API on the local network, allowing integration with any application that supports standard AI interfaces. Advanced users can swap models via HuggingFace URLs or local uploads, while the basic web chat interface provides immediate testing capability. The project represents a significant step toward democratizing local AI deployment, though it's still in early development with manual updates via reflashing.

Key Points

Runs Qwen3-30B-A3B at 7-8 tokens/sec on Raspberry Pi 5 with 8GB RAM and SSD
Exposes OpenAI-compatible local API and includes automatic model download (1.8GB Qwen3.5 2B vision)
Uses 2.66bpw quantization (Q3_K_S) and custom ik_llama.cpp build for optimized performance

Why It Matters

Democratizes local AI deployment, enabling private, cost-effective inference on $80 hardware instead of cloud services.

Read Original Article

Follow-up: Qwen3 30B a3b at 7-8 t/s on a Raspberry Pi 5 8GB (source included)

Why It Matters

Stay Ahead in AI