Gemma 4 E2B runs at 9 tokens/second on an i5 6500 CPU with no GPU?

Gemma 4 E2B runs at 9 tokens/second on an i5 6500 CPU with no GPU

Output quality exceeds GPT-3.5 and approaches GPT-4 according to user?

Output quality exceeds GPT-3.5 and approaches GPT-4 according to user

Qwen 3.5 4B also praised; community recommends Phi-3, Llama 3.2 3B for low-end hardware?

Qwen 3.5 4B also praised; community recommends Phi-3, Llama 3.2 3B for low-end hardware

Open Source

Gemma 4 E2B runs 9 t/s on i5 6500, beats GPT-3.5 and rivals GPT-4

r/LocalLLaMA July 04, 2026

⚡A 4B parameter model outperforms ChatGPT 3.5 on a decade-old CPU

Deep Dive

Reddit user /u/InsideYork reports running a model on an Intel i5 6500, getting 9 tokens per second. They say it's really fast, the output is a lot better than ChatGPT 3.5, and maybe as good as ChatGPT 4, but they haven't used 4 much. They also used Qwen 3.5 4B before this and call it amazing.

Key Points

Gemma 4 E2B runs at 9 tokens/second on an i5 6500 CPU with no GPU
Output quality exceeds GPT-3.5 and approaches GPT-4 according to user
Qwen 3.5 4B also praised; community recommends Phi-3, Llama 3.2 3B for low-end hardware

Why It Matters

Small local models now rival GPT-3.5/4 on old hardware, enabling affordable, private AI for everyone.

Read Original Article

Gemma 4 E2B runs 9 t/s on i5 6500, beats GPT-3.5 and rivals GPT-4

Why It Matters

Related Articles

🚀 Stay Ahead in AI