Runs entirely on Jetson Orin NX SUPER 16GB with Gemma 4 E4B quantized to Q4_K_M, achieving ~200ms cached TTFT and 14-15 tok/s?

Runs entirely on Jetson Orin NX SUPER 16GB with Gemma 4 E4B quantized to Q4_K_M, achieving ~200ms cached TTFT and 14-15 tok/s.

Features 30+ sensors feeding natural language prompts, no Wi-Fi/BT/cellular, and on-device configuration via buttons, joystick, and encoder?

Features 30+ sensors feeding natural language prompts, no Wi-Fi/BT/cellular, and on-device configuration via buttons, joystick, and encoder.

Prompt engineering moved dynamic context out of the system block, dropping cached TTFT from multi-second to ~200ms?

Prompt engineering moved dynamic context out of the system block, dropping cached TTFT from multi-second to ~200ms.

Open Source

Sparky: fully offline suitcase robot with Gemma 4 E4B on Jetson Orin

r/LocalLLaMA May 15, 2026

⚡30+ sensors, no network, 200ms cached TTFT – Sparky runs entirely on-device.

Deep Dive

Sparky is a suitcase-sized robot built by Reddit user u/CreativelyBankrupt that runs completely offline on a Jetson Orin NX SUPER 16GB. It uses Gemma 4 E4B quantized to Q4_K_M via llama.cpp with q8_0 KV cache and flash attention, 12K context, native system role, and default samplers from the model card. Cached time-to-first-token (TTFT) sits around 200ms with a sustained 14-15 tok/s. Speech-to-text is handled by SenseVoiceSmall, text-to-speech by Piper with 43Hz mouth synchronization, and the lid display runs a PixiJS animated face. Vision and OCR are now native to Gemma 4, so the previous BLIP subprocess has been removed. Over 30 sensors (e.g., distance, temperature, motion) are folded into the prompt as natural language each turn.

A key innovation is the prompt structure that stabilizes the KV cache. The persona and tool definitions are fixed at the top, followed by the conversation history, while dynamic sensor and vision data go at the end of the latest user turn. Moving dynamic context out of the system block dropped cached TTFT from multiple seconds to ~200ms. The robot is fully configurable on-device via a button row, joystick, and analog encoder knob — no network interfaces at all. The maker is curious whether others are running Gemma 4 E4B on Orin-class hardware and how they handle sensor/tool context without blowing the prefix cache. This project demonstrates that capable, opinionated AI robots can operate entirely edge-side without cloud dependencies.

Key Points

Runs entirely on Jetson Orin NX SUPER 16GB with Gemma 4 E4B quantized to Q4_K_M, achieving ~200ms cached TTFT and 14-15 tok/s.
Features 30+ sensors feeding natural language prompts, no Wi-Fi/BT/cellular, and on-device configuration via buttons, joystick, and encoder.
Prompt engineering moved dynamic context out of the system block, dropping cached TTFT from multi-second to ~200ms.

Why It Matters

Demonstrates that fully offline, edge AI robots with rich sensor integration are practical, reducing latency and privacy risks.

Read Original Article

Sparky: fully offline suitcase robot with Gemma 4 E4B on Jetson Orin

Why It Matters

Related Articles

🚀 Stay Ahead in AI