Open Source

Running Qwen3.5-0.8B on my 7-year-old Samsung S10E

A developer successfully runs a 0.8B parameter AI model on an old Samsung S10E, achieving 12 tokens per second.

Deep Dive

A developer from the r/LocalLLaMA community has successfully deployed Alibaba's newly released Qwen3.5-0.8B language model on a 7-year-old Samsung Galaxy S10E smartphone. The experiment, which required tinkering with the llama.cpp inference engine and the Termux terminal emulator on Android, resulted in a fully functional local AI running at a practical speed of 12 tokens per second. This milestone highlights the rapid miniaturization of capable AI, moving from data-center exclusivity to pocket-sized hardware. The Qwen3.5-0.8B is a 0.8 billion parameter model from Alibaba's Qwen series, designed to be highly efficient while retaining useful conversational and reasoning abilities.

The technical achievement underscores a significant shift in the AI hardware landscape. By utilizing efficient inference frameworks like llama.cpp, even sub-$100 used smartphones from 2019 can now host a local LLM. The developer noted the model is "far from a gimmick" and can handle real tasks. This democratizes access to AI, enabling private, offline assistants and reducing reliance on cloud APIs. For the industry, it signals that the performance floor for edge-AI devices is rising rapidly, putting pressure on chipmakers and app developers to support local inference. The next frontier will be optimizing these models for even lower power consumption and integrating them seamlessly into mobile operating systems.

Key Points
  • Alibaba's Qwen3.5-0.8B model runs locally on a Samsung S10E (2019) using llama.cpp and Termux.
  • Achieves a generation speed of 12 tokens per second, making it usable for real-time conversation.
  • Proves capable AI no longer requires cutting-edge hardware or cloud connectivity, enabling private mobile assistants.

Why It Matters

Democratizes powerful AI by making it run on billions of existing smartphones, enabling private, offline assistants without cloud costs.