llama.cpp b9389 auto-detects integrated GPUs for CUDA/HIP
New release automatically applies iGPU flag for AMD and NVIDIA integrated graphics.
The ggml-org/llama.cpp project released version b9389, featuring a critical update: automatic application of the iGPU flag for CUDA and HIP backends when an integrated device is detected. This enhancement, merged via pull request #23007, removes the need for manual configuration when running large language models on systems with integrated graphics from NVIDIA (CUDA) or AMD (HIP). Users can now expect seamless acceleration on laptops and all-in-one PCs without specifying device IDs.
The release provides pre-built binaries across multiple platforms. macOS users get Apple Silicon builds (with an optional KleidiAI-enabled variant) and Intel x64, plus an iOS XCFramework. Linux supports Ubuntu on x64, arm64, s390x CPUs, as well as Vulkan, ROCm 7.2, and OpenVINO backends. Windows binaries cover x64 and arm64 CPUs, plus CUDA 12 and 13 DLLs, Vulkan, and HIP. Android arm64 is also included. Some builds (SYCL FP32, openEuler) remain disabled.
- Auto applies iGPU flag for CUDA/HIP on integrated devices (PR #23007)
- Supports macOS, Linux, Windows, and Android across diverse backends
- CuDA 12 and 13 DLLs included for Windows CUDA builds
Why It Matters
Simplifies local AI inference on consumer hardware, eliminating manual GPU selection on laptops and desktops.