Open Source

A few Strix Halo benchmarks (Minimax M2.5, Step 3.5 Flash, Qwen3 Coder Next)

r/LocalLLaMA February 21, 2026

⚡New benchmarks show which large AI models run best on AMD's latest Ryzen AI hardware with 128GB memory.

Deep Dive

A comprehensive benchmark analysis by Reddit user u/spaceman_ evaluates the latest large language models on AMD's cutting-edge Strix Halo platform featuring Ryzen AI Max+ 395 processors with 128GB of memory. Using llama.cpp with ROCm 7.2, the tests compare newly released models including MiniMax M2.5 and Step 3.5 Flash against established options like Qwen3 Coder Next, GLM 4.6V, GLM 4.7 Flash, and GPT-OSS-120B. All benchmarks were conducted at a substantial 30,000 token context length, providing realistic performance data for memory-intensive applications. The results offer valuable guidance for developers and researchers working with AMD's high-memory AI hardware, revealing which models and quantization approaches deliver the best performance-per-watt in this emerging hardware category.

Key Points

Tests MiniMax M2.5 and Step 3.5 Flash - two new models that barely fit in 128GB memory
Benchmarks run at 30,000 token context depth using llama.cpp on ROCm 7.2 with Ryzen AI Max+ 395
Includes Qwen3 Coder Next showing recent improvements and GLM 4.6V/4.7 Flash for comparison

Why It Matters

Provides crucial performance data for developers choosing AI models for AMD's high-memory Strix Halo systems, optimizing cost and efficiency.

Read Original Article

A few Strix Halo benchmarks (Minimax M2.5, Step 3.5 Flash, Qwen3 Coder Next)

Why It Matters

Stay Ahead in AI