Developer Tools

llama.cpp b9102 adds im2col_3d, expands cross-platform builds

The popular LLM inference engine gets a new operator and broader hardware support.

Deep Dive

llama.cpp, the go-to C++ inference engine for running large language models locally, has rolled out version b9102. This release introduces the im2col_3d operator (Pull Request #22903), which is essential for implementing 3D convolutions in neural networks. While most LLMs rely on 2D attention, this addition enables support for models that incorporate 3D spatial reasoning, such as certain vision-language or volumetric processing architectures. The commit, signed with GitHub's verified signature, reflects ongoing community-driven development.

The release's most practical impact is its extensive cross-platform support. Prebuilt binaries are now available for macOS (Apple Silicon with or without KleidiAI optimizations, plus Intel x64), iOS (as XCFramework), Linux (x64/arm64/s390x CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (x64/arm64 CPU, CUDA 12 & 13 DLLs, Vulkan, SYCL, HIP), Android (arm64 CPU), and openEuler (x86 and aarch64 with Ascend 310p/910b and ACL Graph). This breadth ensures developers can deploy llama.cpp across desktops, servers, edge devices, and even Huawei's Ascend NPUs, making local LLM inference more accessible than ever.

Key Points
  • Adds im2col_3d operator (PR #22903) for 3D convolution support in model architectures
  • Prebuilt binaries cover macOS, Linux, Windows, Android, iOS, and openEuler with 12+ hardware backends
  • Includes Apple Silicon builds with KleidiAI optimizations and CUDA 12/13 DLLs for Windows

Why It Matters

llama.ccp remains the most versatile local LLM runtime, now handling more model types across virtually all platforms.