Developer Tools

vLLM and PyTorch 2.11 end aarch64 GPU wheel nightmare for GH200/GB200

A two-year-old bug that silently swapped CUDA for CPU finally fixed on aarch64.

Deep Dive

For years, developers running vLLM on aarch64 Linux systems like NVIDIA's GH200, GB200, and GB300 faced a frustrating packaging bug. When installing PyTorch via standard `pip install torch`, only CPU-only wheels were available from PyPI for aarch64. Users who manually installed a CUDA-enabled build from PyTorch's custom index would find their GPU torch silently replaced by a CPU version when any transitive dependency required a specific PyTorch version, because pip would fall back to the default PyPI index. This turned a simple one-line install into a maze of `--index-url` flags, pinned versions, and post-install sanity checks.

Now, thanks to collaboration between vLLM (via Kaichou You) and the PyTorch team through the PyTorch Foundation, PyTorch 2.11 publishes CUDA-enabled wheels for aarch64 directly to PyPI. This eliminates the need for custom package indexes and workarounds like vLLM's `use_existing_torch.py` script or the uv `no-build-isolation-package` trick. The fix took two years to productionize, but it finally allows developers to run `pip install vllm` on aarch64 systems and have it work out of the box with GPU support.

Key Points
  • PyTorch 2.11 now publishes CUDA-enabled wheels for aarch64 Linux on PyPI, removing the need for a custom download index.
  • Previously, pip would silently replace a manually installed CUDA torch with a CPU wheel when resolving transitive dependencies, breaking vLLM.
  • vLLM had to ship two workarounds: `use_existing_torch.py` (rewriting dependency files) and a uv `no-build-isolation-package` config, now obsolete.

Why It Matters

Simplifies deploying LLMs on NVIDIA Grace Hopper and Blackwell systems, reducing setup time from hours to minutes.