Open Source

Ryzen AI Max 395+ 128GB - Qwen 3.5 35B/122B Benchmarks (100k-250K Context) + Others (MoE)

New benchmarks show local AI models can process 250,000-token documents while maintaining usable generation speeds.

Deep Dive

A new benchmark reveals the impressive capabilities of AMD's latest Ryzen AI Max+ 395 processor when running large language models locally. Using a Framework Desktop with 128GB of RAM and the llama.cpp backend with ROCm 7.2.0 support, a user tested Qwen 3.5-35B models at context windows ranging from 5,000 to 250,000 tokens. The results show that even at the maximum 250,000 token context—equivalent to roughly 187,500 words—the system maintained a generation speed of 14.24 tokens per second, while prompt processing scaled from 625 t/s at baseline to 134 t/s at maximum context.

These benchmarks are significant because they demonstrate that consumer-grade hardware can now handle document analysis tasks that previously required cloud-based API calls to services like OpenAI or Anthropic. The Qwen 3.5-35B model in its Q8_K_XL quantization maintained reasonable performance degradation, with generation speed dropping from 26.87 t/s at baseline to 14.24 t/s at 250K context—a 47% reduction that still leaves the system usable for real-time interaction. This represents a major step toward truly local AI assistants that can process entire books, lengthy legal documents, or extensive codebases without sending data to external servers.

The testing methodology used the latest nightly build of llama.cpp (as of March 9, 2026) with full GPU offloading (-ngl 999) to leverage the Ryzen AI's integrated graphics. The benchmarks specifically measured both prompt processing (pp512) and text generation (tg128) speeds across increasing context depths, providing a comprehensive view of how performance scales with document size. While these numbers will likely improve as the Strix Halo platform matures and software optimization continues, they already show that local AI has reached practical utility for professional workloads involving massive contexts.

Key Points
  • Framework Desktop with Ryzen AI Max+ 395 maintains 14.24 t/s generation at 250K token context
  • Qwen 3.5-35B model processes prompts at 134 t/s even with 250,000 tokens of context memory
  • 128GB RAM configuration enables testing of massive context windows previously only possible with cloud APIs

Why It Matters

Professionals can now analyze entire books or codebases locally with AI, eliminating cloud costs and privacy concerns.