Developer Tools

llama.cpp b9494 adds non-causal vision for Gemma 4

Local LLM runner now supports Gemma 4's unified vision mode.

Deep Dive

llama.cpp, the popular open-source project for running large language models locally, has released version b9494. The headline feature is enabling non-causal vision for Gemma 4 unified, a key capability for processing images without the typical left-to-right causal attention mask. This allows Gemma 4 to attend to all parts of an image simultaneously, improving visual understanding.

The release includes prebuilt binaries for multiple platforms: macOS (Apple Silicon and Intel), Linux (x86, ARM, s390x with Vulkan, ROCm, OpenVINO), Windows (CPU, CUDA, Vulkan, HIP), Android ARM64, and iOS. Some builds like KleidiAI, SYCL, and HIP are disabled in this release. The update reflects ongoing work to make state-of-the-art multimodal models run efficiently on consumer hardware.

Key Points
  • Non-causal attention for Gemma 4 unified vision mode enables full bidirectional image processing
  • Supports 10+ platform builds including Apple Silicon, Windows CUDA, and Android ARM64
  • Part of llama.cpp's continuous integration of Google's Gemma 4 model family for local inference

Why It Matters

Brings Gemma 4's advanced vision to local inference, enabling private on-device image analysis.