Research & Papers

ROCm Status in mid 2026 [D]

Can AMD's RX 7900 XTX (4x FP16 throughput) replace RTX 3090s for training?

Deep Dive

A Reddit user considers swapping RTX 3090s for AMD RX 7900 XTX after hearing ROCm now works for inference, but asks if it’s viable for training. The RX 7900 XTX reportedly offers 4x FP16 throughput at similar power, VRAM, and cost. PyTorch docs say ROCm is fully supported, but the user struggles to find real-world reports and wonders if the ecosystem is still significantly behind CUDA.

Key Points
  • RX 7900 XTX provides ~4x FP16 throughput vs RTX 3090 at similar power and cost
  • ROCm now officially supported in PyTorch, but user reports show training gaps remain
  • Inference on ROCm is stable, but training workflows still lag due to custom kernel and feature parity issues

Why It Matters

For AI practitioners, AMD’s ROCm is nearly viable for inference but still risky for training, keeping NVIDIA dominant.