llama.cpp b9433 restores Metal im2col for faster large kernel inference
The latest llama.cpp release brings back im2col on Apple Silicon, boosting performance.
Deep Dive
ggml-org released llama.cpp tag b9433, restoring the im2col implementation for large kernels on Metal (commit #23901). Build targets include macOS (Apple Silicon arm64, Intel x64, iOS XCFramework), Linux (x64, arm64, s390x, with Vulkan, ROCm, OpenVINO, SYCL), Android (arm64), Windows (x64, arm64, with CUDA 12/13, Vulkan, SYCL, HIP), and openEuler. Some targets are disabled.
Key Points
- Restores Metal im2col for large kernels on Apple Silicon, improving LLM inference performance.
- Supports 15+ build targets including macOS, Linux, Windows, Android, iOS, and openEuler.
- Available with backends such as Vulkan, CUDA 12/13, ROCm 7.2, OpenVINO, and SYCL.
Why It Matters
Local LLM inference on Apple Silicon gets a performance boost, reducing cloud dependency for AI workloads.