llama.cpp b9124 exposes multimodal capabilities via /v1/models endpoint
New release adds modality detection for OpenAI-compatible API servers
Deep Dive
llama.cpp released b9124, which exposes model modalities via the /v1/models endpoint. Builds are available for macOS (Apple Silicon/Intel), Linux (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP), Android, and other platforms.
Key Points
- Adds `mtmd_caps` field to `/v1/models` endpoint for listing supported modalities (text, image, etc.)
- Includes prebuilt binaries for 20+ platforms: macOS, Linux (CPU/Vulkan/ROCm/OpenVINO/SYCL), Windows (CPU/CUDA/Vulkan/SYCL/HIP), Android, and openEuler
- Commit signed with verified GPG key; release supports both CPU and GPU acceleration options
Why It Matters
Streamlines multimodal AI deployment by making capability detection accessible via a standard API endpoint.