llama.cpp B9387 boost AMD MI300 performance with MFMA PP update
New update unlocks MFMA matrix ops for AMD CDNA datacenter GPUs only.
Deep Dive
llama.cpp release B9387 restricts MFMA to AMD’s CDNA architecture—MI100, MI200, and MI300 series datacenter cards. Try it and post your initial results!
Key Points
- MFMA operations now limited to AMD CDNA architecture: MI100, MI200, MI300 series.
- Optimization targets the prefill (PP) phase, crucial for long-context LLM inference.
- Consumer RDNA GPUs (e.g., RX 7000) are excluded; likely no performance gain on those.
Why It Matters
Enterprise AI teams using AMD MI300 can now run LLMs faster without code changes, narrowing the GPU gap.