llama.cpp v9485 drops unnecessary mmproj downloads with --no-mmproj flag
Project with 114K stars reduces overhead for multimodal model users with a simple flag fix.
llama.cpp, the wildly popular open-source C/C++ implementation of LLaMA-based models (114K stars, 19.1K forks on GitHub), dropped a point release tagged b9485 on June 3. The core fix addresses a long-standing annoyance: when launching a model with the --no-mmproj flag (to skip loading a multimodal projector), the system was nevertheless downloading those unnecessary projection weights. This wasted bandwidth and storage, especially for users running pure language models or those who had already aligned their setup. The change, tracked in issue/pull #23425, now properly respects the flag.
Beyond that single fix, the release packages binaries across the usual wide matrix: Apple Silicon (both standard and KleidiAI-enabled), Intel macOS, iOS XCFramework, multiple Linux variants (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32), Android arm64, and Windows builds (CPU, CUDA 12/13, Vulkan, HIP). Some platforms like openEuler are listed as disabled. This release is incremental but shows ongoing maintenance for a project that powers countless local AI deployments on consumer hardware. It saves power users from manually cleaning up unwanted mmproj files.
- Fixes mmproj download despite --no-mmproj flag (issue #23425)
- Released June 3 as tag b9485 on 114K-star repository
- Includes builds for macOS, Linux, Windows, Android, iOS across many backends
Why It Matters
Saves bandwidth and disk for local AI users who only run text models without multimodal extras.