Open Source

Google's Gemma4 runs locally in Chrome at 20 tokens/sec on CPU

No GPU, no llama.cpp—just Chrome and 16GB RAM to run a 2.7B model.

Deep Dive

A new Chrome extension called Dobby lets you run an AI model (a Gemma, not Gemini Nano) entirely locally in Chrome—no GPU, no external tools like llama.cpp required. It feels about 20 tokens per second or faster on a laptop with 16GB RAM, though the creator says they have no actual speed data. Chrome limits each session to 9,216 tokens. Built by Reddit user Some-Cauliflower4902 in five minutes, the extension works for spelling checks, summarizing long posts, or just private chatting.

Key Points
  • Runs Google's Gemma4 (Gemini Nano) locally in Chrome without any GPU, using only CPU and 16GB RAM.
  • Achieves ~20 tokens/second inference speed, with a session limit of 9,216 tokens set by Chrome.
  • Available as 'Dobby' on the Chrome Web Store or as open-source code on GitHub for tinkering.

Why It Matters

Enables private, offline AI inference on consumer hardware without specialized setups or cloud dependencies.