Developer Tools

llama.cpp b9433 restores Metal im2col for faster large kernel inference

The latest llama.cpp release brings back im2col on Apple Silicon, boosting performance.

Deep Dive

ggml-org released llama.cpp tag b9433, restoring the im2col implementation for large kernels on Metal (commit #23901). Build targets include macOS (Apple Silicon arm64, Intel x64, iOS XCFramework), Linux (x64, arm64, s390x, with Vulkan, ROCm, OpenVINO, SYCL), Android (arm64), Windows (x64, arm64, with CUDA 12/13, Vulkan, SYCL, HIP), and openEuler. Some targets are disabled.

Key Points
  • Restores Metal im2col for large kernels on Apple Silicon, improving LLM inference performance.
  • Supports 15+ build targets including macOS, Linux, Windows, Android, iOS, and openEuler.
  • Available with backends such as Vulkan, CUDA 12/13, ROCm 7.2, OpenVINO, and SYCL.

Why It Matters

Local LLM inference on Apple Silicon gets a performance boost, reducing cloud dependency for AI workloads.