Developer Tools

llama.cpp b9430 adds LSX support and LoongArch optimizations

New release brings native LoongArch SIMD and quantized kernel improvements.

Deep Dive

llama.cpp, the high-performance C/C++ library for running large language models locally, has shipped version b9430. The headline feature is initial LSX (LoongArch SIMD extension) support, enabling vectorized operations on LoongArch CPUs. The release optimizes fp16 load/store with native intrinsics like `__lsx_vfcvtl_s_h` and `__lsx_vfcvt_h_s`, replacing slower scalar loops.

Performance gains come from new LSX-accelerated dot product implementations for three quantized formats: q8_0, q6_K, and iq4_xs. The release also includes improvements to reduce ops (int16 pairs to int32). Builds are available across all major platforms, including macOS (ARM/Intel), Linux (x86/ARM/s390x), Windows (CPU/CUDA/Vulkan), and Android.

Key Points
  • First LSX (LoongArch SIMD) support in llama.cpp, with native fp16 load/store intrinsics
  • New LSX dot product kernels for q8_0, q6_K, and iq4_xs quantized formats
  • Improvements to reduce operations converting int16 pairs to int32

Why It Matters

Expands local LLM inference to LoongArch CPUs, improving speed and efficiency for users in that ecosystem.