Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads
AWS now lets teams reserve GPU capacity for LLM inference, solving a major deployment bottleneck.
In 2025, Amazon Web Services (AWS) enhanced its SageMaker AI platform with Flexible Training Plans, a significant update that extends capacity reservation capabilities to inference workloads. Previously designed for training, these plans now let data science teams reserve specific GPU instance types for predetermined durations to ensure availability for deploying large language models (LLMs). This solves a critical pain point where on-demand capacity during peak hours can delay deployments. The system uses an Amazon Resource Name (ARN) for reserved capacity, features transparent upfront pricing, and allows teams to update model versions and scale instance counts within their reservation. It's designed for time-bound critical workloads like competitive benchmarking, limited-duration production testing, and handling predictable traffic surges.
- Extends SageMaker Training Plans to support inference endpoints, allowing GPU capacity reservation for LLM deployment.
- Provides guaranteed capacity with upfront pricing for specific instance types, quantities, and time windows via an Amazon Resource Name (ARN).
- Enables operational flexibility: teams can update model versions and scale instance counts within their reserved capacity limits.
Why It Matters
Eliminates GPU availability uncertainty for critical AI deployments, letting teams focus on model performance instead of infrastructure scrambling.