Open Source

Strix Halo, GNU/Linux Debian, Qwen3.5-(27,35,122B) CTX<=131k, llama.cpp@ROCm, Power & Efficiency

r/LocalLLaMA February 26, 2026

⚡New benchmarks show Qwen3.5 models running efficiently on AMD's upcoming Strix Halo APU using llama.cpp.

Deep Dive

New benchmark results have surfaced for Alibaba's Qwen3.5 large language models, showcasing their performance on AMD's upcoming Strix Halo APU platform. The tests, run by a user on the GNU/Linux Debian operating system with kernel 6.18.12, utilized llama.cpp version 8152 compiled with a nightly build of ROCm 7.12.0, AMD's open software platform for GPU computing. This marks one of the first public demonstrations of the Qwen3.5 model family—specifically the 27B, 35B-A3B, and massive 122B parameter versions—running efficiently on AMD hardware through the popular llama.cpp inference engine. The benchmark focused exclusively on the ROCm backend, highlighting the maturing software ecosystem for AI workloads on AMD GPUs.

The technical details reveal the use of quantized model weights to reduce memory requirements: the 27B and 35B models used 8-bit quantization (Q8), while the 122B model was tested with both 5-bit (Q5_K_M) and 6-bit (Q6_K) quantization. All models supported extended context lengths of up to 131,000 tokens. The 'TheRock' nightly build of ROCm-7.12.0 suggests ongoing optimization work specifically for AMD's next-generation APUs. This benchmark is significant as it demonstrates that competitive, local LLM inference is becoming viable on non-NVIDIA hardware, potentially lowering barriers to entry for developers and researchers while fostering hardware diversity in the AI ecosystem.

Key Points

Qwen3.5 models (27B, 35B, 122B) benchmarked on unreleased AMD Strix Halo APU using llama.cpp
Tests used quantized models (Q8, Q5_K_M, Q6_K) with context lengths up to 131k tokens
Demonstrates maturing ROCm software stack enabling efficient LLM inference on AMD hardware

Why It Matters

Shows AMD becoming a viable alternative to NVIDIA for local AI inference, potentially lowering costs and increasing competition.

Read Original Article

Strix Halo, GNU/Linux Debian, Qwen3.5-(27,35,122B) CTX<=131k, llama.cpp@ROCm, Power & Efficiency

Why It Matters

Stay Ahead in AI