Developer Tools

Amazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting

New instance-level metrics and rolling updates aim to solve hidden latency and risky deployments for production AI.

Deep Dive

Amazon's 2025 enhancements to SageMaker AI target critical pain points in production AI deployment: observability and deployment safety. The 'Enhanced Metrics' feature provides granular, instance-level and container-level visibility into resource utilization (CPU, memory, GPU) and invocation performance (latency, errors), moving beyond aggregated endpoint-level data. This allows teams to pinpoint specific faulty instances causing latency. Simultaneously, 'Rolling Updates' for inference components transform deployment by updating models in configurable batches with integrated CloudWatch alarm monitoring. If an issue is detected, the system triggers an automatic rollback, facilitating zero-downtime updates and minimizing deployment risk. These features, enabled via API parameters like `MetricsConfig`, are designed to support the scaling of complex, multi-model generative AI applications on AWS infrastructure.

Key Points
  • Enhanced Metrics provide instance-level tracking of CPU, memory, and GPU utilization with configurable publishing frequencies (e.g., 10-300 seconds).
  • Rolling Updates for inference components deploy in batches with CloudWatch alarm monitoring that triggers automatic rollbacks, enabling safer deployments.
  • The updates address previously hidden performance issues and reduce the need for duplicate infrastructure provisioning during model updates.

Why It Matters

Provides enterprise teams the granular visibility and deployment safety needed to run reliable, large-scale generative AI applications in production.