Llama.cpp b9483 fixes Hexagon profiler, expands platform support
Profiler updates remove redundant NONEs and add tot.usec column for better analysis.
The ggml-org team has tagged llama.cpp b9483, a maintenance release focused on the Qualcomm Hexagon DSP backend. Key changes include a fix to the profiler output that previously displayed redundant 'NONE' entries, cleaning up performance traces. Additionally, the Hexagon profiling script now supports a 'tot.usec' column, providing total microsecond timing for each operation. This makes it easier to identify bottlenecks when running large language models on Hexagon-based devices like smartphones and IoT hardware.
The release continues llama.cpp's tradition of extensive platform coverage. Builds are available for macOS (Apple Silicon with optional KleidiAI, Intel x64), Linux (x64, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, SYCL FP32), Windows (x64, arm64, CUDA 12/13, Vulkan, HIP), and Android arm64. Notably, CUDA 13 DLLs are now included, supporting newer NVIDIA GPUs. These updates ensure llama.cpp remains the go-to local LLM runtime for developers across desktop, server, and mobile environments.
- Fixed Hexagon profiler output by removing redundant 'NONE' entries for cleaner logs
- Updated profiling script to include a 'tot.usec' column for total microsecond timing
- Supports 15+ platform/backend combinations including macOS, Linux, Windows, Android, and multiple GPU APIs
Why It Matters
Accurate Hexagon profiling enables developers to optimize LLM inference on mobile and edge devices.