Non-causal attention for Gemma 4 unified vision mode enables full bidirectional image processing?

Non-causal attention for Gemma 4 unified vision mode enables full bidirectional image processing

Supports 10+ platform builds including Apple Silicon, Windows CUDA, and Android ARM64?

Supports 10+ platform builds including Apple Silicon, Windows CUDA, and Android ARM64

Part of llama.cpp's continuous integration of Google's Gemma 4 model family for local inference?

Part of llama.cpp's continuous integration of Google's Gemma 4 model family for local inference

Developer Tools

llama.cpp b9494 adds non-causal vision for Gemma 4

llama.cpp Releases June 04, 2026

⚡Local LLM runner now supports Gemma 4's unified vision mode.

Deep Dive

llama.cpp, the popular open-source project for running large language models locally, has released version b9494. The headline feature is enabling non-causal vision for Gemma 4 unified, a key capability for processing images without the typical left-to-right causal attention mask. This allows Gemma 4 to attend to all parts of an image simultaneously, improving visual understanding.

The release includes prebuilt binaries for multiple platforms: macOS (Apple Silicon and Intel), Linux (x86, ARM, s390x with Vulkan, ROCm, OpenVINO), Windows (CPU, CUDA, Vulkan, HIP), Android ARM64, and iOS. Some builds like KleidiAI, SYCL, and HIP are disabled in this release. The update reflects ongoing work to make state-of-the-art multimodal models run efficiently on consumer hardware.

Key Points

Non-causal attention for Gemma 4 unified vision mode enables full bidirectional image processing
Supports 10+ platform builds including Apple Silicon, Windows CUDA, and Android ARM64
Part of llama.cpp's continuous integration of Google's Gemma 4 model family for local inference

Why It Matters

Brings Gemma 4's advanced vision to local inference, enabling private on-device image analysis.

Read Original Article

llama.cpp b9494 adds non-causal vision for Gemma 4

Why It Matters

Related Articles

🚀 Stay Ahead in AI