Developer Tools

Llama.cpp b9483 fixes Hexagon profiler, expands platform support

Profiler updates remove redundant NONEs and add tot.usec column for better analysis.

Deep Dive

The ggml-org team has tagged llama.cpp b9483, a maintenance release focused on the Qualcomm Hexagon DSP backend. Key changes include a fix to the profiler output that previously displayed redundant 'NONE' entries, cleaning up performance traces. Additionally, the Hexagon profiling script now supports a 'tot.usec' column, providing total microsecond timing for each operation. This makes it easier to identify bottlenecks when running large language models on Hexagon-based devices like smartphones and IoT hardware.

The release continues llama.cpp's tradition of extensive platform coverage. Builds are available for macOS (Apple Silicon with optional KleidiAI, Intel x64), Linux (x64, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, SYCL FP32), Windows (x64, arm64, CUDA 12/13, Vulkan, HIP), and Android arm64. Notably, CUDA 13 DLLs are now included, supporting newer NVIDIA GPUs. These updates ensure llama.cpp remains the go-to local LLM runtime for developers across desktop, server, and mobile environments.

Key Points
  • Fixed Hexagon profiler output by removing redundant 'NONE' entries for cleaner logs
  • Updated profiling script to include a 'tot.usec' column for total microsecond timing
  • Supports 15+ platform/backend combinations including macOS, Linux, Windows, Android, and multiple GPU APIs

Why It Matters

Accurate Hexagon profiling enables developers to optimize LLM inference on mobile and edge devices.