AMD Strix Halo refresh with 192gb!
Up to 320GB of memory with dual chips unlocks local 122B models at Q8.
AMD’s upcoming Strix Halo refresh, internally codenamed Gorgon Halo 495 Max, is reportedly set to support up to 192GB of unified memory per chip—a massive leap from the current generation’s 128GB cap. According to leaks from the r/Amd subreddit, the new APU will allow dual-chip linking via Infinity Fabric, enabling up to 320GB of total system memory. This is a game-changer for running large, dense AI models locally. The CPU and GPU improvements are expected to be modest (single-digit percentage gains), but the memory boost is the headline feature.
For AI practitioners, 192GB per chip means you can run a 122B parameter Mixture-of-Experts model at Q8 quantization with near-full context length—something that currently requires multiple discrete GPUs. With dual chips, even larger models or simultaneous inference workloads become feasible. The rumor suggests AMD is prioritizing on-device AI capabilities over raw compute, positioning the Gorgon Halo as a specialized workstation APU. Users who bought the first Strix Halo might skip this refresh unless they need the extra memory for local LLM deployments.
- 192GB unified memory per chip, up from 128GB in current Strix Halo.
- Dual-chip linking yields up to 320GB total system memory.
- Enables local running of 122B parameter models at Q8 quantization with full context.
Why It Matters
Empowers local AI inference for large models without external GPUs, crucial for edge AI and privacy.