Google DeepMind's Gemma 4: five sizes, 256K context, multimodal reasoning
New open-weight models handle text, images, audio, and configurable thinking.
Google DeepMind launched Gemma 4, a family of open-weights multimodal models built for reasoning, coding, and agentic workflows. The lineup includes five sizes: E2B, E4B, 12B, 26B A4B, and 31B parameters, offering both Dense and Mixture-of-Experts (MoE) architectures. All models handle text and image inputs with variable aspect ratios and resolutions, while the E2B, E4B, and 12B variants also natively process audio and video. A standout feature is configurable thinking modes for enhanced reasoning, along with a context window of 128K tokens for small models and 256K for medium models. The models also support over 140 languages, making them globally applicable.
For agentic and coding use cases, Gemma 4 achieves notable improvements in coding benchmarks and includes native function-calling support for autonomous agents. A hybrid attention mechanism interleaves local sliding window attention with full global attention (final layer always global), reducing memory footprint while preserving long-context awareness. Global layers use unified Keys/Values and Proportional RoPE (p-RoPE) to optimize long sequences. Smaller models are optimized for on-device execution on laptops and mobile devices, while larger variants suit workstations and servers. With native system prompt support for structured conversations, Gemma 4 democratizes access to frontier-level AI across diverse deployment scenarios.
- Five sizes: E2B, E4B, 12B, 26B A4B, and 31B parameters, with Dense and MoE architectures.
- Multimodal: text, image, audio (on E2B/E4B/12B), video, plus up to 256K token context window.
- Configurable thinking modes, native function calling for agents, and 140+ language support.
Why It Matters
Open-weights Gemma 4 brings frontier multimodal AI to mobile, laptop, and server, enabling custom agents and reasoning at scale.