PyTorch 2.11 now publishes CUDA-enabled wheels for aarch64 Linux on PyPI, removing the need for a custom download index?

PyTorch 2.11 now publishes CUDA-enabled wheels for aarch64 Linux on PyPI, removing the need for a custom download index.

Previously, pip would silently replace a manually installed CUDA torch with a CPU wheel when resolving transitive dependencies, breaking vLLM?

Previously, pip would silently replace a manually installed CUDA torch with a CPU wheel when resolving transitive dependencies, breaking vLLM.

vLLM had to ship two workarounds?

`use_existing_torch.py` (rewriting dependency files) and a uv `no-build-isolation-package` config, now obsolete.

Developer Tools

vLLM and PyTorch 2.11 end aarch64 GPU wheel nightmare for GH200/GB200

PyTorch Blog May 19, 2026

⚡A two-year-old bug that silently swapped CUDA for CPU finally fixed on aarch64.

Deep Dive

For years, developers running vLLM on aarch64 Linux systems like NVIDIA's GH200, GB200, and GB300 faced a frustrating packaging bug. When installing PyTorch via standard `pip install torch`, only CPU-only wheels were available from PyPI for aarch64. Users who manually installed a CUDA-enabled build from PyTorch's custom index would find their GPU torch silently replaced by a CPU version when any transitive dependency required a specific PyTorch version, because pip would fall back to the default PyPI index. This turned a simple one-line install into a maze of `--index-url` flags, pinned versions, and post-install sanity checks.

Now, thanks to collaboration between vLLM (via Kaichou You) and the PyTorch team through the PyTorch Foundation, PyTorch 2.11 publishes CUDA-enabled wheels for aarch64 directly to PyPI. This eliminates the need for custom package indexes and workarounds like vLLM's `use_existing_torch.py` script or the uv `no-build-isolation-package` trick. The fix took two years to productionize, but it finally allows developers to run `pip install vllm` on aarch64 systems and have it work out of the box with GPU support.

Key Points

PyTorch 2.11 now publishes CUDA-enabled wheels for aarch64 Linux on PyPI, removing the need for a custom download index.
Previously, pip would silently replace a manually installed CUDA torch with a CPU wheel when resolving transitive dependencies, breaking vLLM.
vLLM had to ship two workarounds: `use_existing_torch.py` (rewriting dependency files) and a uv `no-build-isolation-package` config, now obsolete.

Why It Matters

Simplifies deploying LLMs on NVIDIA Grace Hopper and Blackwell systems, reducing setup time from hours to minutes.

Read Original Article

vLLM and PyTorch 2.11 end aarch64 GPU wheel nightmare for GH200/GB200

Why It Matters

Related Articles

🚀 Stay Ahead in AI