Developer Tools

Secure short-term GPU capacity for ML workloads with EC2 Capacity Blocks for ML and SageMaker training plans

Reserve GPUs up to 8 weeks ahead with 40-50% discount for ML workloads.

Deep Dive

GPU demand has outpaced supply, making reliable access for short-term ML workloads a challenge. On-demand instances suffer from availability uncertainty and high costs, while spot instances offer 90% savings but risk interruption. AWS now addresses this with EC2 Capacity Blocks for ML and SageMaker training plans, allowing you to reserve GPU capacity for defined windows.

With Capacity Blocks, you can reserve specific instance types for 1–14 days (daily increments) or 15–182 days (weekly increments), starting up to 8 weeks in advance. Each block supports up to 64 instances, and you can share capacity across accounts via AWS Organizations. Pricing is 40–50% lower than on-demand, with no upfront commitment. SageMaker training plans offer similar flexibility for managed ML workflows. This gives teams reliable, cost-effective GPU access for time-bound projects like testing, fine-tuning, or preparing inference pipelines.

Key Points
  • Reserve GPU capacity from 1 to 182 days, with start times up to 8 weeks in advance.
  • 40–50% discount vs. on-demand pricing, with no long-term contract required.
  • Support up to 64 instances per block and share capacity across multiple AWS accounts.

Why It Matters

Reliable short-term GPU access for ML workloads without costly over-provisioning or interruption risks.