llama.cpp b9251 improves multimodal fitting and backend support
New release fixes mtmd fitting with mmproj and adds ggml_backend_dev_t.
The ggml-org team has released llama.cpp b9251, a maintenance update focused on refining multimodal processing and streamlining internal code. The key change is in mtmd (multimodal thread) fitting: the fit_params function now properly accounts for mmproj (multimodal projection), improving the accuracy of vision-language models that rely on projection layers. Additionally, the codebase has been cleaned up: the function alloc_compute_meta was renamed to the more descriptive reserve_compute_meta, and several unused functions have been removed to reduce bloat. A new ggml_backend_dev_t support was introduced, along with enhanced debug logging for easier troubleshooting.
On the platform side, this release continues llama.cpp's tradition of broad compatibility. Prebuilt binaries are available for macOS (Apple Silicon with optional KleidiAI, Intel, and iOS XCFramework), Linux (Ubuntu x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android (arm64 CPU), Windows (x64/arm64 CPU, CUDA 12.4/13.1, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with 310p/910b ACL Graph). With 30 assets, this release ensures users across diverse hardware stacks can benefit from the improvements.
- mtmd fitting now incorporates mmproj for more accurate multimodal model inference
- Renamed alloc_compute_meta to reserve_compute_meta and removed unused functions for cleaner code
- Added ggml_backend_dev_t support and debug logging; binaries for 10+ platform/backend combos
Why It Matters
This release refines multimodal LLM inference and extends hardware compatibility for llama.cpp users.