PyTorch migrates Docker builds to OSDC with remote BuildKit
New workflow uses remote BuildKit for faster, cross-architecture Docker image builds.
PyTorch has updated its `.github/workflows/docker-builds.yml` to migrate the full `.ci/docker/**` image matrix to OSDC (Open Source Developer Cloud) using ARC (Actions Runner Controller) runners. The orchestrator runs on a small `mt-l-x86iavx512-2-4` self-hosted runner inside a `ghcr.io/actions/actions-runner:latest` container. The actual Docker build is offloaded to the cluster's remote BuildKit pools, which allows a single x86 orchestrator to drive both amd64 and arm64 builds—only the `buildkit_addr` differs per matrix row. This eliminates the need for separate EC2 or local daemon runners for each architecture.
The authentication model has been streamlined: ECR login now uses OIDC on OSDC pods (no EC2 instance profile or IRSA needed), with a specific IAM role `arn:aws:iam::308535385114:role/arc` granted ECR write permissions on the `pytorch/ci-image` repository. GHCR login remains unchanged, using a PAT gated on push events. Several legacy steps were dropped, including `calculate-docker-image`, `pull-docker-image`, `chown-workspace`, and the `nick-fields/retry` wrapper for GHCR push—replaced by a one-shot `docker buildx imagetools create` server-side cross-registry copy. The `.ci/docker/build.sh` script now supports a `REMOTE_BUILDKIT` flag that swaps `--load` for `--push` (remote builders cannot load into a local daemon) and skips post-build sanity checks. Additionally, a `ciflow/docker` tag trigger was added to enable force-building images on any commit without an open PR, bypassing the removed `paths:` filter that previously blocked such pushes.
- Uses OSDC ARC runners with remote BuildKit for cross-architecture builds (amd64 and arm64) from a single orchestrator.
- Simplified authentication via OIDC for ECR, removing legacy instance profile and IRSA dependencies.
- Added `ciflow/docker` tag trigger for force-building Docker images on any commit, useful for testing Dockerfile changes without a PR.
Why It Matters
Faster and more scalable Docker CI/CD for PyTorch, reducing infrastructure overhead and enabling easier testing of image changes.