Did Google hide the best version of Gemma 4 e4b in Android? The extracted model beats Unsloth and everything else I've tried.
A 3.6GB model from Google AI Edge Gallery beats larger 3.7GB alternatives in reasoning tests.
A Reddit user, u/LawyerCompetitive478, reported that Google's Gemma 4 e4b model, extracted from the AI Edge Gallery on Android via ADB in litertlm format, outperforms all other versions they've tested, including Unsloth's 3.7GB GGUF (gemma-4-E4B-it-UD-Q2_K_XL.gguf). The Android version weighs only 3.6GB, yet demonstrates superior reasoning and coherence, particularly in Russian text generation. In contrast, the litert-community/gemma-4-E4B-it-litert-lm variant produced incoherent output, suggesting potential bugs or quantization issues in community builds.
This discovery underscores the impact of model format and optimization on performance. Google's proprietary litertlm format, likely tailored for Android's AI Edge Gallery, may include optimizations for edge inference that improve quality without increasing size. The user's experience highlights how official, platform-specific builds can outperform generalized open-source versions, even when slightly smaller. For developers and AI enthusiasts, this suggests that model extraction from official sources may yield better results than downloading from repositories like Unsloth or Hugging Face, especially for multilingual tasks.
- Google's Gemma 4 e4b from AI Edge Gallery (3.6GB litertlm format) outperforms Unsloth's 3.7GB GGUF version in reasoning.
- The litert-community variant produces incoherent Russian text, indicating potential quantization or format issues.
- Model extraction via ADB reveals official Android builds may have optimizations not present in open-source downloads.
Why It Matters
This shows official platform-optimized models can outperform larger open-source versions, impacting edge AI deployment strategies.